【YOLOv3 MobileNetV2】详解及作为YOLOv3 backbone

1 MobileNetV2简介

MobileNetV2是一个轻量型卷积神经网络，使用深度可分离卷积。

如下图表示其中一个block的结构，主要包括Expansion layer，Depthwise Convolution，Projection layer。
Expansion layer表示扩展层，使用1x1卷积，目的是将低维空间映射到高维空间。
Projection layer表示投影层，使用1x1卷积，目的是把高维特征映射到低维空间去。
Depthwise Convolution表示深度可分离卷积，完成卷积功能，降低计算量、参数量。

Bottleneck Residual Block

宏观上看，结构是短连接，内部结构是CBR+CBR+CB，最后一个没有Relu了，论文中所谓使用了线性激活函数，也就是恒等函数( $f(x)=x$ )的意思。[注释:CBR表示Conv+BN+Relu]

这种Inverted residuals是一种中间胖，两头窄的结构，像一个纺锤形，常规Residual Block结构，是两头胖，中间窄的结构。
那Inverted residuals从瘦到胖，胖多少呢？再从胖到瘦，又瘦多少呢？这就涉及到新名词Expansion factor(扩展系数)，它控制着网络维度，为了保证短连接的形成，一个block中的“胖瘦”系数相同，这个系数通常是6，可改动。如下图所示。

Inverted residuals

2 介绍一种常规MobileNetv2结构

如下表所示，t 表示bottleneck中“胖瘦”系数，通道数变为几倍；c 表示输出通道数，n 表示这个模块整了几次，s 表示stride，步长，控制特征图尺寸大小，1的话尺寸不变，2的话，尺寸变为原来的一半。

一种结构示意表

3 MobilenetV2代码

直接看代码，可运行，获取网络计算量与参数量。

import torch
from torch import nn
# from torchvision.models.utils import load_state_dict_from_url     # 低版本pytorch用这个
from torch.hub import load_state_dict_from_url      # 从链接中下载模型预训练权重

model_urls = {
    'mobilenet_v2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
}

# ------------------------------------------------------#
#   这个函数的目的是确保Channel个数能被8整除。
# ------------------------------------------------------#
def _make_divisible(v, divisor, min_value=None):
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v


# ------------------------------------------------------#
#   Conv+BN+ReLU组在一起，参数顺序：输入通道数,输出通道数...
#   经常会用到，整合在一起而已
# ------------------------------------------------------#
class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            nn.BatchNorm2d(out_planes),
            nn.ReLU6(inplace=True)
        )


# ------------------------------------------------------#
#   InvertedResidual，先变胖后变瘦
#   参数顺序：输入通道数,输出通道数，步长，变胖倍数
# ------------------------------------------------------#
class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        # 所谓的隐藏维度，其实就是输入通道数*变胖倍数
        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))   # pointwise

        layers.extend([
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),   # depthwise

            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),    # pointwise-linear
            nn.BatchNorm2d(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)


class MobileNetV2(nn.Module):
    def __init__(self, num_classes=1000, width_mult=1.0, inverted_residual_setting=None, round_nearest=8):
        """
        MobileNet V2 main class
        Args:
            num_classes (int): Number of classes
            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount
            inverted_residual_setting: Network structure
            round_nearest (int): Round the number of channels in each layer to be a multiple of this number
                                 Set to 1 to turn off rounding
        """
        super(MobileNetV2, self).__init__()
        block = InvertedResidual
        input_channel = 32
        last_channel = 1280

        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                # 208,208,32 -> 208,208,16
                [1, 16, 1, 1],
                # 208,208,16 -> 104,104,24
                [6, 24, 2, 2],
                # 104,104,24 -> 52,52,32
                [6, 32, 3, 2],

                # 52,52,32 -> 26,26,64
                [6, 64, 4, 2],
                # 26,26,64 -> 26,26,96
                [6, 96, 3, 1],

                # 26,26,96 -> 13,13,160
                [6, 160, 3, 2],
                # 13,13,160 -> 13,13,320
                [6, 320, 1, 1],
            ]

        # only check the first element, assuming user knows t,c,n,s are required
        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:
            raise ValueError("inverted_residual_setting should be non-empty "
                             "or a 4-element list, got {}".format(inverted_residual_setting))

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)

        # 416,416,3 -> 208,208,32
        features = [ConvBNReLU(3, input_channel, stride=2)]

        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                # 这个block就是上面那个InvertedResidual函数
                features.append(block(input_channel, output_channel, stride, expand_ratio=t))
                input_channel = output_channel

        # building last several layers
        features.append(ConvBNReLU(input_channel, self.last_channel, kernel_size=1))
        # make it nn.Sequential
        self.features = nn.Sequential(*features)

        # building classifier
        self.classifier = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(self.last_channel, num_classes),
        )

        # weight initialization
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out')
                if m.bias is not None:
                    nn.init.zeros_(m.bias)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                nn.init.zeros_(m.bias)

    def forward(self, x):
        x = self.features(x)
        x = x.mean([2, 3])      # 对第二第三维度进行求平均，为啥？
        x = self.classifier(x)
        return x


def mobilenet_v2(pretrained=False, progress=True):
    model = MobileNetV2()
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['mobilenet_v2'], model_dir="model_data",
                                              progress=progress)
        model.load_state_dict(state_dict)

    return model


if __name__ == "__main__":
    model = mobilenet_v2()
    print(model)

    # ------------------------------------#
    # 方法1 获取计算量与参数量
    # ------------------------------------#
    from torchsummaryX import summary
    summary(model, torch.zeros(1, 3, 416, 416))

    # ------------------------------------#
    # 方法2 获取计算量与参数量
    # ------------------------------------#
    from thop import profile
    input = torch.randn(1, 3, 416, 416)     # 1张3通道尺寸为416x416的图片作为输入
    flops, params = profile(model, (input,))
    print(flops, params)

4 YOLOv3网络模型-----backbone可选MobileNetv2和darknet53

可结合【YOLOv3 net】网络结构及代码详解进行阅读

from collections import OrderedDict
import torch
import torch.nn as nn
from nets.darknet import darknet53              # darknet53的分析可见https://www.jianshu.com/p/6b4675a9f378
from nets.mobilenet_v2 import mobilenet_v2      # 可见上面的代码


# --------------------------------------------------#
#   YOLOv3的FPN特征金字塔检测头，需要从三个地方引输出
#   model.features就像列表里面有各个块，通过索引方式控制
#   得到out3, out4, out5
# --------------------------------------------------#
class MobileNetV2(nn.Module):
    def __init__(self, pretrained = False):
        super(MobileNetV2, self).__init__()
        self.model = mobilenet_v2(pretrained=pretrained)

    def forward(self, x):
        out3 = self.model.features[:7](x)
        out4 = self.model.features[7:14](out3)
        out5 = self.model.features[14:18](out4)
        return out3, out4, out5

# --------------------------------------------------#
#   再整个CBR放在一起
# --------------------------------------------------#
def conv2d(filter_in, filter_out, kernel_size):
    pad = (kernel_size - 1) // 2 if kernel_size else 0
    return nn.Sequential(OrderedDict([
        ("conv", nn.Conv2d(filter_in, filter_out, kernel_size=kernel_size, stride=1, padding=pad, bias=False)),
        ("bn", nn.BatchNorm2d(filter_out)),
        ("relu", nn.LeakyReLU(0.1)),
    ]))

# ------------------------------------------------------------------------#
#   make_last_layers里面一共有七个卷积，前五个用于提取特征。
#   后两个用于获得yolo网络的预测结果，称之为yolo head
# ------------------------------------------------------------------------#
def make_last_layers(filters_list, in_filters, out_filter):
    m = nn.Sequential(
        conv2d(in_filters, filters_list[0], 1),         # 1表示kernel_size
        conv2d(filters_list[0], filters_list[1], 3),
        conv2d(filters_list[1], filters_list[0], 1),
        conv2d(filters_list[0], filters_list[1], 3),
        conv2d(filters_list[1], filters_list[0], 1),
        conv2d(filters_list[0], filters_list[1], 3),
        nn.Conv2d(filters_list[1], out_filter, kernel_size=1, stride=1, padding=0, bias=True)
    )
    return m

# ---------------------------------------------------#
#   获得类
# ---------------------------------------------------#
def get_classes(classes_path):
    with open(classes_path, encoding='utf-8') as f:
        class_names = f.readlines()
    class_names = [c.strip() for c in class_names]
    return class_names, len(class_names)

class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes, backbone="mobilenetv2"):
        super(YoloBody, self).__init__()
        #---------------------------------------------------#   
        #   生成darknet53的主干模型
        #   获得三个有效特征层，他们的shape分别是：
        #   52,52,256
        #   26,26,512
        #   13,13,1024
        #---------------------------------------------------#
        if backbone == "darknet53":
            self.backbone = darknet53()
            in_filters = [256, 512, 1024]
        elif backbone == "mobilenetv2":
            #---------------------------------------------------#
            #   52,52,32；26,26,92；13,13,320
            #---------------------------------------------------#
            self.backbone   = MobileNetV2(pretrained=False)
            in_filters      = [32, 96, 320]
        else:
            raise ValueError('Unsupported backbone - `{}`, Use darknet53, mobilenetv2.'.format(backbone))

        #---------------------------------------------------#
        #   out_filters : [64, 128, 256, 512, 1024]，利用最后三个进行FPN融合
        #---------------------------------------------------#
        # out_filters = self.backbone.layers_out_filters      # 表示Darknet53网络几个结构块的输出通道数，make_last_layers中用到此处
        out_filters = in_filters

        #------------------------------------------------------------------------#
        #   计算yolo_head的输出通道数，对于voc数据集而言
        #   final_out_filter0 = final_out_filter1 = final_out_filter2 = 75
        #   final_out_filter0 = len(anchors_mask[0]) * (num_classes + 5) = 3*(20+5)
        #   3*(20+5)含义：
        #       3表示网格点上先验框个数，
        #       20表示voc分类类别数，coco是80类，5:
        #       4个先验框框调整参数+1表示网格内是否有物体
        #   anchors_mask：表示先验框尺寸变化，通常有9种，一般不改，具体详见正文分析
        #------------------------------------------------------------------------#
        self.last_layer0            = make_last_layers([512, 1024], out_filters[-1], len(anchors_mask[0]) * (num_classes + 5))

        self.last_layer1_conv       = conv2d(512, 256, 1)   # 2D卷积，降低通道数
        self.last_layer1_upsample   = nn.Upsample(scale_factor=2, mode='nearest')   # 上采样：c通道数不变，w,h尺寸变为原来2倍
        self.last_layer1            = make_last_layers([256, 512], out_filters[-2] + 256, len(anchors_mask[1]) * (num_classes + 5))

        self.last_layer2_conv       = conv2d(256, 128, 1)
        self.last_layer2_upsample   = nn.Upsample(scale_factor=2, mode='nearest')
        self.last_layer2            = make_last_layers([128, 256], out_filters[-3] + 128, len(anchors_mask[2]) * (num_classes + 5))

    def forward(self, x):
        #---------------------------------------------------#   
        #   获得三个有效特征层，他们的shape分别是：
        #   52,52,256；26,26,512；13,13,1024
        #---------------------------------------------------#
        x2, x1, x0 = self.backbone(x)       # backbone return out3, out4, out5

        #---------------------------------------------------#
        #   第一个特征层
        #   out0 = (batch_size,255,13,13)
        #---------------------------------------------------#
        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        # yolo head中有七层卷积(nn.Sequential整合的)，前5层提取特征，同时其输出要进行 卷积+上采样 去和上一个layer输出融合形成FPN。
        # 故这个地方[:5]和[5:]
        out0_branch = self.last_layer0[:5](x0)
        out0        = self.last_layer0[5:](out0_branch)     # torch.size([1,75,13,13])

        # 13,13,512 -> 13,13,256 -> 26,26,256
        x1_in = self.last_layer1_conv(out0_branch)      # {Tensor：1}
        x1_in = self.last_layer1_upsample(x1_in)        # {Tensor：1}

        # 26,26,256 + 26,26,512 -> 26,26,768
        x1_in = torch.cat([x1_in, x1], 1)       # 所谓融合也就是特征图拼接，层数变多   # 后一个参数1的作用  {Tensor：1}  torch.size([1,768,26,26])
        #---------------------------------------------------#
        #   第二个特征层
        #   out1 = (batch_size,255,26,26)
        #---------------------------------------------------#
        # 26,26,768 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        out1_branch = self.last_layer1[:5](x1_in)
        out1        = self.last_layer1[5:](out1_branch)     # torch.size([1,75,26,26])

        # 26,26,256 -> 26,26,128 -> 52,52,128
        x2_in = self.last_layer2_conv(out1_branch)
        x2_in = self.last_layer2_upsample(x2_in)

        # 52,52,128 + 52,52,256 -> 52,52,384
        x2_in = torch.cat([x2_in, x2], 1)           # torch.size([1,384,52,52])
        #---------------------------------------------------#
        #   第一个特征层
        #   out3 = (batch_size,255,52,52)
        #---------------------------------------------------#
        # 52,52,384 -> 52,52,128 -> 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128
        out2 = self.last_layer2(x2_in)      # torch.size([1,75,52,52])
        return out0, out1, out2


if __name__ == '__main__':
    classes_path = '../model_data/voc_classes.txt'      # 见下方
    class_names, num_classes = get_classes(classes_path)
    anchors_mask = [[6, 7, 8], [3, 4, 5], [0, 1, 2]]  # 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
    model = YoloBody(anchors_mask, num_classes, backbone="mobilenetv2")       # backbone="mobilenetv2"     or     darknet53
    print(model)

    from torchsummaryX import summary
    summary(model, torch.zeros(1, 3, 416, 416))

    from thop import profile
    input = torch.randn(1, 3, 416, 416)     # 1张3通道尺寸为416x416的图片作为输入
    flops, params = profile(model, (input,))
    print(flops, params)

voc_classes.txt的内容：

aeroplane
bicycle
bird
boat
bottle
bus
car
cat
chair
cow
diningtable
dog
horse
motorbike
person
pottedplant
sheep
sofa
train
tvmonitor

参考链接

https://zhuanlan.zhihu.com/p/98874284
https://github.com/bubbliiiing/mobilenet-yolov4-pytorch

最后编辑于：2022.04.07 21:32:13

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,772评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,458评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,610评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,640评论 1赞 276
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,657评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,590评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,962评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,631评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,870评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,611评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,704评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,386评论 4赞 319
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,969评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,944评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,179评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 44,742评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,440评论 2赞 342