Tensorflow 环境搭建
Windows GPU 版安装
依赖软件包
- Tensorflow 1.5.0/1.6.0
- Cuda v9.0
- cudnn v7.0.5 for cuda 9.0
cuDNN v7.0.5 解压后将文件(bin、include、lib)拷贝到 CUDA 安装目录(NVIDIA GPU Computing Toolkit/CUDA/v9.0)下
各个版本需要保持一致,不然会存在版本不一致问题,注意选择正确的系统版本
python 环境安装(训练环境/开发环境)
训练环境建议安装 Anaconda , 它是一个流行的进行数据科学研究的 python 平台,预安装了很多库,可以很方便的管理多个版本的 python 环境,实现 python 环境的自由切换
Tensorflow 底层使用了 gRPC 框架,使用 Protocol Buffers 数据交换协议,protoc 工具是一个编译器,可以很方便将 proto 协议文件编译成供多个语言版本使用
此处使用 3.4.0 版本,新版本编译命令可能不同,为避免后续出现错误,可以直接使用 3.4.0 版本
安装
- 下载Anaconda并安装
- 配置环境变量
安装目录\Anaconda3;安装目录\Anaconda3\Scripts;安装目录\Anaconda3\Library\bin;
到 path( 系统环境变量)中;
配置国内源
# 添加Anaconda的TUNA镜像
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
# 设置搜索时显示通道地址
conda config --set show_channel_urls yes
- 安装 python 环境
#查看系统当前已有的Python环境,
conda info --envs
#安装指定版本的 python 环境
conda create --name py35 python=3.5
#切换 python 环境
activate py35
#切回原来的Python环境
deactivate py35
#删除环境
conda remove --name py35 --all
python 3 环境下的 Tensorflow 安装
install Tensorflow
# For CPU
pip install tensorflow==1.6
# For GPU
pip install tensorflow-gpu==1.6
users can install dependencies using pip:
pip install Cython
pip install pillow
pip install lxml
pip install jupyter
pip install matplotlib
模型训练项目的编译准备
- Protobuf Compilation
protoc object_detection/protos/*.proto --python_out=.
- Add Libraries to PYTHONPATH
1. 在你的Anaconda3安装路/Anaconda3/Lib/site-packages 下新建一个txt文件
(我这里的安装路径是C:\ProgramData\Anaconda3\Lib\site-packages);如果安装有其他 python 环境,则在对应的环境目录(Anaconda3\envs\py35\Lib\site-packages)下新建一个txt文件 。
2. 在新建的txt文件中写入自己对应的 Tensorflow object_detection 工程的目录路径:
F:\project\project
F:\project\project\slim
3. 将文件名改为 tensorflow_model.pth (注意这里的后缀一定要以pth结尾)
- Testing the Installation
#From tensorflow/models/research/
python object_detection/builders/model_builder_test.py
模型训练
样本标注
使用 label_images 工具用于标记图片,生成 Pascal voc 格式的 标注文件
生成 tensorflow 支持的 tfrecord 文件
工作目录结构
|- template
| |- annotations (标注文件)
| |- images (样本图片)
| |- label_maps
| | |- *.pbtxt (标注映射文件,id 从 1 开始)
脚本工具 - tfrecord_util.py 【python 3 环境】
import os
import io
import tensorflow as tf
from PIL import Image
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util
from collections import namedtuple
import glob
import pandas as pd
import xml.etree.ElementTree as ET
current_path = 'template所在目录'
train_path = os.path.join(current_path, "template")
# 图片标注文件目录
annotations_dir = os.path.join(train_path, "annotations")
# 图片目录
images_path = os.path.join(train_path, "images")
# 映射文件
labels_path = os.path.join(train_path, "label_maps")
labels_file = os.path.join(labels_path, "mscoco_label_map.pbtxt")
# csv 文件(全路径)
csv_file = os.path.join(train_path, "temp_csv_name.csv")
# record 文件(全路径)
tf_record_file = os.path.join(train_path, "tf_record_file.record")
# ---------------------------------------------------------------------- xml operator
def xml_to_csv(path):
xml_list = []
for xml_file in glob.glob(path + '/*.xml'):
tree = ET.parse(xml_file)
root = tree.getroot()
for member in root.findall('object'):
# if member[0].text != 'a_hn_101':
# continue
file_path = root.find('path').text
filename = file_path.split("/")[-1].split("\\")[-1]
value = (filename,
int(root.find('size')[0].text),
int(root.find('size')[1].text),
member[0].text,
int(member[4][0].text),
int(member[4][1].text),
int(member[4][2].text),
int(member[4][3].text)
)
xml_list.append(value)
column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
xml_df = pd.DataFrame(xml_list, columns=column_name)
return xml_df
# ---------------------------------------------------------------------- tfrecord operator
classes_num = 100
label_map = label_map_util.load_labelmap(labels_file)
print("success loading label map file["+str(labels_file)+"]")
# print('\n-------------label_map------------------\n')
# print(label_map)
# categories array [{'id':id,'name':name},···]
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=classes_num, use_display_name=True)
# category_index dic {id : {'id':id,'name':name}, ···}
# category_index = label_map_util.create_category_index(categories)
# category_index dic {name : {'id':id,'name':name}, ···}
category_index = {}
for cat in categories:
category_index[cat['name']] = cat
print(category_index)
print("success generating categories dic")
def class_text_to_int(row_label):
if row_label in category_index.keys():
# print(str(category_index[row_label]['id']))
return category_index[row_label]['id']
else:
# print(row_label)
return 0
def split(df, group):
data = namedtuple('data', ['filename', 'object'])
gb = df.groupby(group)
return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
# image_format = b'jpg'
if image.format != 'JPEG':
print(group.filename)
raise ValueError('Image format not JPEG')
else:
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
if class_text_to_int(row['class']) == 0:
print(group.filename)
# print(row['class'].encode('utf8'))
continue
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(class_text_to_int(row['class']))
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
# ----------------------------------------------------------------------
def generate_tf_record_file(recreate=True):
"""
generate the tensorflow record file from label xml files which belongs sample images
:param recreate: if create a new record file
:return: tf_record_file path
"""
if recreate:
# 1. 读取图片标注文件目录下的所有 xml 文件,并转化为 csv 文件
xml_df = xml_to_csv(annotations_dir)
xml_df.to_csv(csv_file, index=None)
print('Successfully converted xml['+str(annotations_dir)+'] to csv['+str(csv_file)+'].')
print(csv_file)
# 2. 将 csv 文件转 record 文件
examples = pd.read_csv(csv_file)
grouped = split(examples, 'filename')
writer = tf.python_io.TFRecordWriter(tf_record_file)
for group in grouped:
try:
tf_example = create_tf_example(group, images_path)
except:
print(group.filename)
continue
writer.write(tf_example.SerializeToString())
writer.close()
print('Successfully created the TFRecords: {}'.format(tf_record_file))
return tf_record_file
else:
# TODO - look up the exist file
return None
def main(_):
my_tf_record_file = generate_tf_record_file()
print(my_tf_record_file)
if __name__ == '__main__':
tf.app.run()
模型训练相关配置
配置文件 faster_rcnn_resnet101.config
# Faster R-CNN with Resnet-101 (v1) configuration for MSCOCO Dataset.
# Users should configure the fine_tune_checkpoint field in the train config as
# well as the label_map_path and input_path fields in the train_input_reader and
# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
# should be configured.
model {
faster_rcnn {
num_classes: 23
image_resizer {
keep_aspect_ratio_resizer {
min_dimension: 1024
max_dimension: 1280
}
}
feature_extractor {
type: 'faster_rcnn_resnet101'
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
scales: [0.25, 0.5, 1.0, 2.0]
aspect_ratios: [0.5, 1.0, 2.0]
height_stride: 16
width_stride: 16
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.01
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.6
first_stage_max_proposals: 400
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
initial_crop_size: 14
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
use_dropout: false
dropout_keep_probability: 1.0
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.7
max_detections_per_class: 100
max_total_detections: 300
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 900000
learning_rate: .00003
}
schedule {
step: 1200000
learning_rate: .000003
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
# fine_tune_checkpoint: "F:/project/project/faster_rcnn_resnet101_coco_2018_01_28/model.ckpt"
# from_detection_checkpoint: true
# Note: The below line limits the training process to 200K steps, which we
# empirically found to be sufficient enough to train the pets dataset. This
# effectively bypasses the learning rate schedule (the learning rate will
# never decay). Remove the below line to train indefinitely.
#num_steps: 10000
data_augmentation_options {
random_adjust_brightness {
max_delta: 0.1
}
}
data_augmentation_options {
random_image_scale {
min_scale_ratio: 0.8
max_scale_ratio: 1.2
}
}
#data_augmentation_options {
# random_crop_to_aspect_ratio {
# }
#}
#data_augmentation_options {
# random_adjust_contrast {
# min_delta: 0.5
# max_delta: 1.5
# }
#}
#data_augmentation_options {
# random_adjust_saturation {
# min_delta: 0.5
# max_delta: 1.5
# }
#}
}
train_input_reader: {
tf_record_input_reader {
input_path: "D:/Workspace/train_dir/all/tf_record_file_23_3035_20180724.record"
}
label_map_path: "D:/Workspace/train_dir/all/mscoco_label_map_23.pbtxt"
shuffle: true
}
eval_config: {
# num_examples: 1
num_visualizations: 200
# Note: The below line limits the evaluation process to 10 evaluations.
# Remove the below line to evaluate indefinitely.
max_evals: 2
visualization_export_dir: "D:/Workspace/train_dir/all/20180724/eval/exportfrcnn"
}
eval_input_reader: {
tf_record_input_reader {
input_path: "D:/Workspace/train_dir/all/tf_record_file_23_3035_20180724_eval.record"
}
label_map_path: "D:/Workspace/train_dir/all/mscoco_label_map_23.pbtxt"
shuffle: true
num_readers: 5
num_epochs: 1
}
配置文件主要分为 5 个部分:
- model :定义 神经网络模型结构,及相关超参数
- train_config: 训练相关配置
- train_input_reader: 训练样本输入相关配置
- eval_config: 模型评估相关配置
- eval_input_reader:评估样本输入相关配置
model 部分
- num_classes 对应待检测物体的总数(一共有多少个标注样本)
- keep_aspect_ratio_resizer.min_dimension、keep_aspect_ratio_resizer.max_dimension 控制样本输入缩放后的大小
- feature_extractor.first_stage_features_stride 第一阶段特征提取步长,训练时可以保持 16 不变,如果样本中 sku 比较密集,多是远拍,sku 比较小,16 的情况下的训练效果不佳,可以考虑减小该值为 8
- grid_anchor_generator.height_stride、grid_anchor_generator.width_stride 物体框训练时的滑动步长,训练时可以保持 16 不变,如果样本中 sku 比较密集,多是远拍,sku 比较小,如果样本中 sku 比较密集,多是远拍,sku 比较小,16 的情况下的训练效果不佳,可以考虑减小该值为 8
- first_stage_nms_iou_threshold 第一阶段框 IOU 阈值,可以适当减小来增大查全率,但相应准确率可能降低,范围 0~1
- first_stage_max_proposals 第一阶段选取得推荐框的个数,可以适当增大来增大查全率,但相应准确率可能降低
- batch_non_max_suppression.iou_threshold 第二阶段 IOU 阈值,可以适当减小来增大查全率,但相应准确率可能降低,范围 0~1
- batch_non_max_suppression.max_detections_per_class 每类样本的最大检测数量
- batch_non_max_suppression.max_total_detections 所有样本的最大检测数量
train_config 部分
- initial_learning_rate 初始学习率 , 0.0003、0.0002都可以
- data_augmentation_options 数据增强选项
- random_adjust_brightness 随机调节亮度
- random_image_scale 随机缩放图片大小
- random_crop_to_aspect_ratio 随机裁剪到指定比例大小
以上几类增强比较常用
train_input_reader 部分
- tf_record_input_reader.input_path 指定 tfrecord 文件路径
- label_map_path 指定标注映射文件路径
- shuffle 是否打乱样本原有顺序,随机输入训练
eval_config 部分
- num_visualizations 评估导出图片数量,根据评估输入样本决定,不用太大,主要用于评估结果的可视化
- visualization_export_dir 指定评估图片的保存路径
eval_input_reader 部分
- tf_record_input_reader.input_path 指定 tfrecord 文件路径
- label_map_path 指定标注映射文件路径
- shuffle 是否打乱样本原有顺序,随机输入训练
- num_epochs 评估样本几次,一般不用改
模型训练
训练:
# object_detection 工程所在目录下,执行如下命令
python object_detection/train.py --logtostderr --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config --train_dir=F:/Workspaces/hongniu3sku/train/train_data/train/20180530
# pipeline_config_path :训练配置文件所在路径
# train_dir : 训练所产生的中间文件保存目录
评估:
# object_detection 工程所在目录下,执行如下命令
python object_detection/eval.py --logtostderr --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config --checkpoint_dir=F:/Workspaces/hongniu3sku/train/train_data/train/20180530 --eval_dir=F:/Workspaces/hongniu3sku/train/train_data/eval/20180530
# pipeline_config_path :训练配置文件所在路径
# checkpoint_dir : 指定训练时所产生的中间文件的保存目录
# eval_dir: 评估时所产生的中间文件保存目录
导出模型:
# object_detection 工程所在目录下,执行如下命令
python object_detection/export_inference_graph.py --input_type image_tensor --pipeline_config_path=F:/Workspaces/hongniu3sku/train/faster_rcnn_resnet101_20180530.config --trained_checkpoint_prefix=F:/Workspaces/hongniu3sku/train/train_data/train/20180530/model.ckpt-157978 --output_directory=F:/Workspaces/hongniu3sku/train/train_data/export/20180530
# pipeline_config_path :训练配置文件所在路径
# trained_checkpoint_prefix:指定模型导出使用的中间文件 ,model.ckpt-【数字】 对应导出哪一步的参数到最终模型中
# output_directory:指定模型最终的导出目录
最终导出的文件有:
|- saved_model
| |- variables
| |- saved_model.pb (tensorflow serving 使用的模型文件)
|- checkpoint (检查点临时文件)
|- frozen_inference_graph.pb (冻结参数的用于推理的图文件)
|- model.ckpt.* (模型数据,参数、结构等)
建议每次训练后 checkpoint、frozen_inference_graph.pb、model.ckpt.* 都保存,方便后续对该模型进行优化