本文以 Faster R-CNN
为例,介绍如何修改 MMDetection v2
的配置文件,来训练 VOC
格式的自定义数据集。
2021.9.1 更新:适配 MMDetection v2.16
目录:
- MMDetection v2 目标检测(1):环境搭建
- MMDetection v2 目标检测(2):数据准备
- MMDetection v2 目标检测(3):配置修改
- MMDetection v2 目标检测(4):模型训练和测试
服务器的环境配置:
-
Ubuntu
:18.04.5 -
CUDA
:10.1.243 -
Python
:3.7.9 -
PyTorch
:1.5.1 -
MMDetection
:2.16.0
1 修改基础配置
./configs/_base_
的目录结构:
_base_
├─ datasets
├─ models
├─ schedules
└─ default_runtime.py
可以看出,包含四类配置:
-
datasets
:定义数据集 -
models
:定义模型架构 -
schedules
:定义训练计划 -
default_runtime.py
:定义运行信息
打开 ./configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py
:
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/coco_detection.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
修改数据集配置的路径:
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/voc0712.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
2 修改数据集配置
打开 ./configs/_base_/datasets/voc0712.py
:
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1000, 600),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='RepeatDataset',
times=3,
dataset=dict(
type=dataset_type,
ann_file=[
data_root + 'VOC2007/ImageSets/Main/trainval.txt',
data_root + 'VOC2012/ImageSets/Main/trainval.txt'
],
img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline))
evaluation = dict(interval=1, metric='mAP') # epoch
- 修改数据集的路径
data_root
、ann_file
、img_prefix
,重复次数times
,并添加标签类别classes
:
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/MyDataset/'
classes = ('car', 'pedestrian', 'cyclist')
data = dict(
train=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/train.txt',
img_prefix=data_root,
pipeline=train_pipeline,
classes=classes),
val=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/val.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes),
test=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/test.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes))
Tips:
data_root
中的MyDataset
可改为任意自定义数据集的名字。
- 添加图像增强方式,并修改图像缩放比例
img_scale
:
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='AutoAugment',
policies=[
[dict(
type='Rotate',
level=5,
img_fill_val=(124, 116, 104),
prob=0.5,
scale=1)
],
[dict(type='Rotate', level=7, img_fill_val=(124, 116, 104)),
dict(
type='Translate',
level=5,
prob=0.5,
img_fill_val=(124, 116, 104))
],
]),
# 单尺度
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
# 多尺度
```
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True),
```
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
Tips:
如果
img_scale
是一个浮点数,则直接作为缩放比例。
如果
img_scale
是一对整数,则需要根据长短边计算缩放比例。
这样可确保缩放后的长短边,均不超过设置的尺寸。
之后再根据缩放比例,调整图像尺寸。
如果设置多组值,可实现多尺度训练。
注意:
官方文档建议将test_pipeline
中的ImageToTensor
替换为DefaultFormatBundle
。
3 修改模型架构配置
打开 ./configs/_base_/models/faster_rcnn_r50_fpn.py
:
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=80, # 改为自定义数据集的类别个数
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
修改 roi_head
的类别个数 num_classes
:
model = dict(
type='FasterRCNN',
pretrained='torchvision://resnet50',
roi_head=dict(
type='StandardRoIHead',
bbox_roi_extractor=dict(
type='SingleRoIExtractor',
roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
out_channels=256,
featmap_strides=[4, 8, 16, 32]),
bbox_head=dict(
type='Shared2FCBBoxHead',
in_channels=256,
fc_out_channels=1024,
roi_feat_size=7,
num_classes=3,
bbox_coder=dict(
type='DeltaXYWHBBoxCoder',
target_means=[0., 0., 0., 0.],
target_stds=[0.1, 0.1, 0.2, 0.2]),
reg_class_agnostic=False,
loss_cls=dict(
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
loss_bbox=dict(type='L1Loss', loss_weight=1.0))))
注意:
MMDetection 2.0
后的版本,类别个数不需要加1
。
4 修改训练计划配置
打开 ./configs/_base_/schedules/schedule_1x.py
:
# optimizer
optimizer = dict(
type='SGD', # 可设为 'SGD', 'Adadelta', 'Adagrad', 'Adam', 'RMSprop' 等
lr=0.02,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step', # 可设为 'step', 'cyclic', 'poly', 'ConsineAnnealing' 等
warmup='linear', # 可设为 'constant', 'linear', 'exp', 'None'
warmup_iters=500,
warmup_ratio=0.001,
step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
修改学习率 lr
和迭代轮数 total_epochs
:
# optimizer
optimizer = dict(
type='SGD',
lr=0.02 / 8,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[7])
runner = dict(max_epochs=8)
Tips:
Faster R-CNN
的默认学习率lr=0.02
对应批大小batch_size=16
。
因此需要根据实际情况,按比例缩放学习率。
5 修改运行信息配置
打开 ./configs/_base_/default_runtime.py
:
checkpoint_config = dict(interval=1) # epoch
# yapf:disable
log_config = dict(
interval=50, # iteration
hooks=[
dict(type='TextLoggerHook'),
# dict(type='TensorboardLoggerHook')
])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)] # 也可设为 [('train', 1), ('val', 1)]
修改 log_config
的日志记录间隔 interval
,并开启 TensorBoard
记录器:
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
6 创建自定义配置
另外,也可以将上面步骤 1-5
修改的配置写在一个文件中。
这样就能够更方便地管理不同的配置文件,避免因频繁修改导致出错。
- 打开
configs
目录:
cd configs
- 新建自定义配置目录:
mkdir myconfig
- 在
./myconfig
目录下,新建faster_rcnn_r50_fpn_1x_mydataset.py
:
# 修改基础配置
_base_ = [
'../_base_/models/faster_rcnn_r50_fpn.py',
'../_base_/datasets/voc0712.py',
'../_base_/schedules/schedule_1x.py',
'../_base_/default_runtime.py'
]
# 修改数据集配置
dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/MyDataset/'
classes = ('car', 'pedestrian', 'cyclist')
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(
type='AutoAugment',
policies=[
[dict(
type='Rotate',
level=5,
img_fill_val=(124, 116, 104),
prob=0.5,
scale=1)
],
[dict(type='Rotate', level=7, img_fill_val=(124, 116, 104)),
dict(
type='Translate',
level=5,
prob=0.5,
img_fill_val=(124, 116, 104))
],
]),
# 单尺度
dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
# 多尺度
```
dict(
type='Resize',
img_scale=[(1333, 640), (1333, 672), (1333, 704), (1333, 736),
(1333, 768), (1333, 800)],
multiscale_mode="value",
keep_ratio=True),
```
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img']),
])
]
data = dict(
train=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/train.txt',
img_prefix=data_root,
pipeline=train_pipeline,
classes=classes),
val=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/val.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes),
test=dict(
type=dataset_type,
ann_file=data_root + 'ImageSets/Main/test.txt',
img_prefix=data_root,
pipeline=test_pipeline,
classes=classes))
# 修改模型架构配置
model = dict(
roi_head=dict(
bbox_head=dict(num_classes=3)))
# 修改训练计划配置
# optimizer
optimizer = dict(
type='SGD',
lr=0.02 / 8,
momentum=0.9,
weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[7])
runner = dict(max_epochs=8)
# 修改运行信息配置
checkpoint_config = dict(interval=1)
log_config = dict(
interval=100,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook')
])
evaluation = dict(interval=1, metric='mAP')
7 修改其他信息
在训练和测试时,遇到的一些容易报错的地方,这里做下记录。
7.1 标签类别
- 打开
./mmdet/datasets/voc.py
修改 VOCDataset()
的标签类别 CLASSES
:
class VOCDataset(XMLDataset):
CLASSES = ('car', 'pedestrian', 'cyclist')
```
CLASSES = ('aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car',
'cat', 'chair', 'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant', 'sheep', 'sofa', 'train',
'tvmonitor')
```
- 打开
./mmdet/core/evaluation/class_names.py
修改 voc_classes()
返回的标签类别:
def voc_classes():
return [
'car', 'pedestrian', 'cyclist'
]
```
return [
'aeroplane', 'bicycle', 'bird', 'boat', 'bottle', 'bus', 'car', 'cat',
'chair', 'cow', 'diningtable', 'dog', 'horse', 'motorbike', 'person',
'pottedplant', 'sheep', 'sofa', 'train', 'tvmonitor'
]
```
注意:
以上代码,如果只有一个类别,需要在类别后加一个逗号,否则会报错。
7.2 其他信息
- 打开
./mmdet/datasets/voc.py
如果是自定义数据集的名字,需要注释报错信息 ValueError
,并将 self.year
设为 None
:
class VOCDataset(XMLDataset):
def __init__(self, **kwargs):
super(VOCDataset, self).__init__(**kwargs)
if 'VOC2007' in self.img_prefix:
self.year = 2007
elif 'VOC2012' in self.img_prefix:
self.year = 2012
else:
self.year = None
# raise ValueError('Cannot infer dataset year from img_prefix')
Tips:
这里的年份主要是为了区分计算AP
时采用的标准。
VOC2007
采用11 points
计算(即:[0, 0.1, ..., 1]),而其他数据集则采用AUC
计算。
- 打开
./mmdet/datasets/xml_style.py
如果图像文件不是 jpg
格式,需要将 filename
和 img_path
的后缀名改为相应格式:
def load_annotations(self, ann_file):
data_infos = []
img_ids = mmcv.list_from_file(ann_file)
for img_id in img_ids:
# filename = f'JPEGImages/{img_id}.jpg'
filename = f'JPEGImages/{img_id}.png'
xml_path = osp.join(self.img_prefix, 'Annotations',
f'{img_id}.xml')
tree = ET.parse(xml_path)
root = tree.getroot()
size = root.find('size')
width = 0
height = 0
if size is not None:
width = int(size.find('width').text)
height = int(size.find('height').text)
else:
# img_path = osp.join(self.img_prefix, 'JPEGImages',
# '{}.jpg'.format(img_id))
img_path = osp.join(self.img_prefix, 'JPEGImages',
'{}.png'.format(img_id))
img = Image.open(img_path)
width, height = img.size
data_infos.append(
dict(id=img_id, filename=filename, width=width, height=height))
return data_infos
如果标注文件中不存在 difficult
标签,需要将 difficult
设为 0
:
def get_ann_info(self, idx):
img_id = self.data_infos[idx]['id']
xml_path = osp.join(self.img_prefix, 'Annotations', f'{img_id}.xml')
tree = ET.parse(xml_path)
root = tree.getroot()
for obj in root.findall('object'):
name = obj.find('name').text
if name not in self.CLASSES:
continue
label = self.cat2label[name]
# difficult = int(obj.find('difficult').text)
try:
difficult = int(obj.find('difficult').text)
except AttributeError:
difficult = 0
注意:
目前最新的版本,已经修复了这个问题,可以忽略。
- 打开
./tools/robustness_eval.py
将 results
中的 20
改为自定义数据集的类别个数:
def get_voc_style_results(filename, prints='mPC', aggregate='benchmark'):
eval_output = mmcv.load(filename)
num_distortions = len(list(eval_output.keys()))
# results = np.zeros((num_distortions, 6, 20), dtype='float32')
results = np.zeros((num_distortions, 6, 3), dtype='float32')
8 结语
有帮助的话,点个赞再走吧,谢谢~
参考: