精简CNN模型系列之五：MobileNet v2

介绍

Depthwise Convolution应该首创于Google的MobileNet网络。自此后渐渐它已经被用于了越来越多的移动端CNN网络当中。MobileNet v2相对于MobileNet v1而言没有新的计算单元的改变，有的只是结构的微调。它将Depthwise Convolution用于Residual module当中，并试着用理论与试验证明了直接在thinner的bottleneck层上进行skip learning连接以及对bottleneck layer不进行ReLu非线性处理可取得共好的结果。这么说来相对于之前的MobileNet v1而言还是有进步的。

当然它还对标了下Face++的ShuffleNet v1，在内存/计算效率上都实现了对此一对手的反超。

MobileNetV2中使用的新颖的module

MobileNet v2的基础元素

Depthwise Convolution

构成MobileNet v2的主要module是基于一个带bottleneck的residual module而设计的。其上最大的一个变化（此变化亦可从MobileNet v1中follow而来）即是其上的3x3 conv使用了效率更高的Depthwise Conv（当然是由Depthiwise conv + pointwise conv组成）。它可有效地节省所需的真正计算。

线性的bottlenecks使用

通常我们使用1x1 conv的bottleneck在residual module当中（或其它像Inception module等）都是借鉴了之前在Network in Network中的想法即通过它来降低feature maps的channel数目以使得真正计算密集的3x3 conv计算量不那么大。

本篇paper中，作者们从理论上表明本质上ReLu对于feature maps之上的那些非零正数并不构成影响，而只是对它们进行线性变换。它们提出了如下两条ReLu使用的指导法则。

如果一个tensor（一般为feature map）在经ReLu非线性变换处理后如果非零的量并没有变化，那么它本质上相当于是经过了一个线性变换；
只有当输入的feature maps恰能被表示于输入整体空间的一个底维度组合当中时，使用ReLu在其后才能完整保留所有的信息量（坦白说，理解的亦很模糊，不多说了）。

相反的residual连接

与通用的residual module在bottleneck之前的较宽层上进行skip learning连接不同，mobilenet v2中的module直接在bottleneck layer上进行skip learning连接。理论上bottleneck layer作为module中串行管道的一个环节，它已经容纳了所有通过的信息，所以直接对其进行连接还可节省所需的internal tensor数量，从而减少内存需求。

基本module层次结构

可以看出residual module内部上来先是一个bottleneck layer，它不像之前意义上的bottleneck那样寻求减少输入channels数目（因为它输入feature maps的channel数目已经很少了，算是较thinner的tensor），而是对feature maps进行channels扩展，这里扩展系数为t，最后3x3 conv（Depthwise conv）再对它进行表达式的转换等操作，然后再以一个1x1 conv的缩减输出。整体而言它的输入channels数目为k，输出则为k^'。

MobileNet v2结构

下表中可见MobileNet v2的整体网络结构。

MobileNetV2的网络结构

它同我们之前见识过的许多CNN网络有许多一脉相承的地方，比如上来先使用普通的conv进行基础特征提取，然后再使用新颖的residual module一级级处理，feature map size越来越小，但其channels数目则不断增加。另外每个residual module内部使用的扩展因子都为6（作者在5-10范围上作了实验，最终选用6）。

此外，我们亦可像MobileNet v1中做的那样，使用width multiplier通过对输入图片大小进行调整以控制整体需要的计算与模型大小。

下表为MobileNet v2与MobileNet v1及ShuffleNet在模型体积大小上的比较。

MobileNetV2与MobileNetV1及ShuffleNet在模型大小的比较

下图为MobileNet v2与其它较新的移动端CNN网络结构比较。NasNet显得够复杂的。

MobileNetV2与其它较新的移动端CNN网络结构比较

实验结果

下表为MobileNet v2与其它移动端CNN网络在ImageNet上的精度及模型大小、计算量等比较。再次发现人家Google这次又是使用16个GPUs以async的方式来训练的，真是技艺高超啊。

MobileNetV2与其它CNN网络在ImageNet上的精度及计算等比较

代码分析

下面为具体的实现的网络结构的层次描述。相对于Pytorch, Tensorflow不大好的一个地方是有太多的参数与APIs了。或者它用起来很强大，但学习成本却太高。。对于Data scientist而言，pytorch的code写起来可真是容易太多。

V2_DEF = dict(
    defaults={
        # Note: these parameters of batch norm affect the architecture
        # that's why they are here and not in training_scope.
        (slim.batch_norm,): {'center': True, 'scale': True},
        (slim.conv2d, slim.fully_connected, slim.separable_conv2d): {
            'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6
        },
        (ops.expanded_conv,): {
            'expansion_size': expand_input(6),
            'split_expansion': 1,
            'normalizer_fn': slim.batch_norm,
            'residual': True
        },
        (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'}
    },
    spec=[
        op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]),
        op(ops.expanded_conv,
           expansion_size=expand_input(1, divisible_by=1),
           num_outputs=16),
        op(ops.expanded_conv, stride=2, num_outputs=24),
        op(ops.expanded_conv, stride=1, num_outputs=24),
        op(ops.expanded_conv, stride=2, num_outputs=32),
        op(ops.expanded_conv, stride=1, num_outputs=32),
        op(ops.expanded_conv, stride=1, num_outputs=32),
        op(ops.expanded_conv, stride=2, num_outputs=64),
        op(ops.expanded_conv, stride=1, num_outputs=64),
        op(ops.expanded_conv, stride=1, num_outputs=64),
        op(ops.expanded_conv, stride=1, num_outputs=64),
        op(ops.expanded_conv, stride=1, num_outputs=96),
        op(ops.expanded_conv, stride=1, num_outputs=96),
        op(ops.expanded_conv, stride=1, num_outputs=96),
        op(ops.expanded_conv, stride=2, num_outputs=160),
        op(ops.expanded_conv, stride=1, num_outputs=160),
        op(ops.expanded_conv, stride=1, num_outputs=160),
        op(ops.expanded_conv, stride=1, num_outputs=320),
        op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280)
    ],
)

下面为slim库中生成mobilenet model的函数。它的骨干又要调用另外一个mobilenetbase函数来完成，见后面。

@slim.add_arg_scope
def mobilenet(inputs,
              num_classes=1001,
              prediction_fn=slim.softmax,
              reuse=None,
              scope='Mobilenet',
              base_only=False,
              **mobilenet_args):
  """Mobilenet model for classification, supports both V1 and V2.
  Note: default mode is inference, use mobilenet.training_scope to create
  training network.
  Args:
    inputs: a tensor of shape [batch_size, height, width, channels].
    num_classes: number of predicted classes. If 0 or None, the logits layer
      is omitted and the input features to the logits layer (before dropout)
      are returned instead.
    prediction_fn: a function to get predictions out of logits
      (default softmax).
    reuse: whether or not the network and its variables should be reused. To be
      able to reuse 'scope' must be given.
    scope: Optional variable_scope.
    base_only: if True will only create the base of the network (no pooling
    and no logits).
    **mobilenet_args: passed to mobilenet_base verbatim.
      - conv_defs: list of conv defs
      - multiplier: Float multiplier for the depth (number of channels)
      for all convolution ops. The value must be greater than zero. Typical
      usage will be to set this value in (0, 1) to reduce the number of
      parameters or computation cost of the model.
      - output_stride: will ensure that the last layer has at most total stride.
      If the architecture calls for more stride than that provided
      (e.g. output_stride=16, but the architecture has 5 stride=2 operators),
      it will replace output_stride with fractional convolutions using Atrous
      Convolutions.
  Returns:
    logits: the pre-softmax activations, a tensor of size
      [batch_size, num_classes]
    end_points: a dictionary from components of the network to the corresponding
      activation tensor.
  Raises:
    ValueError: Input rank is invalid.
  """
  is_training = mobilenet_args.get('is_training', False)
  input_shape = inputs.get_shape().as_list()
  if len(input_shape) != 4:
    raise ValueError('Expected rank 4 input, was: %d' % len(input_shape))

  with tf.variable_scope(scope, 'Mobilenet', reuse=reuse) as scope:
    inputs = tf.identity(inputs, 'input')
    net, end_points = mobilenet_base(inputs, scope=scope, **mobilenet_args)
    if base_only:
      return net, end_points

    net = tf.identity(net, name='embedding')

    with tf.variable_scope('Logits'):
      net = global_pool(net)
      end_points['global_pool'] = net
      if not num_classes:
        return net, end_points
      net = slim.dropout(net, scope='Dropout', is_training=is_training)
      # 1 x 1 x num_classes
      # Note: legacy scope name.
      logits = slim.conv2d(
          net,
          num_classes, [1, 1],
          activation_fn=None,
          normalizer_fn=None,
          biases_initializer=tf.zeros_initializer(),
          scope='Conv2d_1c_1x1')

      logits = tf.squeeze(logits, [1, 2])

      logits = tf.identity(logits, name='output')
    end_points['Logits'] = logits
    if prediction_fn:
      end_points['Predictions'] = prediction_fn(logits, 'Predictions')
  return logits, end_points

下面是择要的mobilenetbase函数的主干。。已经无力吐槽。。怀疑这么一个CNN就写成这样多层次，有必要吗？很是好奇是否真的有人会像笔者一样有耐心将它看完。绝不会再看第二遍了。。

 if multiplier <= 0:
    raise ValueError('multiplier is not greater than zero.')

  # Set conv defs defaults and overrides.
  conv_defs_defaults = conv_defs.get('defaults', {})
  conv_defs_overrides = conv_defs.get('overrides', {})
  if use_explicit_padding:
    conv_defs_overrides = copy.deepcopy(conv_defs_overrides)
    conv_defs_overrides[
        (slim.conv2d, slim.separable_conv2d)] = {'padding': 'VALID'}

  if output_stride is not None:
    if output_stride == 0 or (output_stride > 1 and output_stride % 2):
      raise ValueError('Output stride must be None, 1 or a multiple of 2.')

  # a) Set the tensorflow scope
  # b) set padding to default: note we might consider removing this
  # since it is also set by mobilenet_scope
  # c) set all defaults
  # d) set all extra overrides.
  with _scope_all(scope, default_scope='Mobilenet'), \
      safe_arg_scope([slim.batch_norm], is_training=is_training), \
      _set_arg_scope_defaults(conv_defs_defaults), \
      _set_arg_scope_defaults(conv_defs_overrides):
    # The current_stride variable keeps track of the output stride of the
    # activations, i.e., the running product of convolution strides up to the
    # current network layer. This allows us to invoke atrous convolution
    # whenever applying the next convolution would result in the activations
    # having output stride larger than the target output_stride.
    current_stride = 1

    # The atrous convolution rate parameter.
    rate = 1

    net = inputs
    # Insert default parameters before the base scope which includes
    # any custom overrides set in mobilenet.
    end_points = {}
    scopes = {}
    for i, opdef in enumerate(conv_defs['spec']):
      params = dict(opdef.params)
      opdef.multiplier_func(params, multiplier)
      stride = params.get('stride', 1)
      if output_stride is not None and current_stride == output_stride:
        # If we have reached the target output_stride, then we need to employ
        # atrous convolution with stride=1 and multiply the atrous rate by the
        # current unit's stride for use in subsequent layers.
        layer_stride = 1
        layer_rate = rate
        rate *= stride
      else:
        layer_stride = stride
        layer_rate = 1
        current_stride *= stride
      # Update params.
      params['stride'] = layer_stride
      # Only insert rate to params if rate > 1.
      if layer_rate > 1:
        params['rate'] = layer_rate
      # Set padding
      if use_explicit_padding:
        if 'kernel_size' in params:
          net = _fixed_padding(net, params['kernel_size'], layer_rate)
        else:
          params['use_explicit_padding'] = True

      end_point = 'layer_%d' % (i + 1)
      try:
        net = opdef.op(net, **params)
      except Exception:
        print('Failed to create op %i: %r params: %r' % (i, opdef, params))
        raise
      end_points[end_point] = net
      scope = os.path.dirname(net.name)
      scopes[scope] = end_point
      if final_endpoint is not None and end_point == final_endpoint:
        break

    # Add all tensors that end with 'output' to
    # endpoints
    for t in net.graph.get_operations():
      scope = os.path.dirname(t.name)
      bn = os.path.basename(t.name)
      if scope in scopes and t.name.endswith('output'):
        end_points[scopes[scope] + '/' + bn] = t.outputs[0]
    return net, end_points

参考文献

MobileNetV2: Inverted Residuals and Linear Bottlenecks, Mark-Sandler, 2018
https://github.com/tensorflow/models/tree/master/research/slim/nets/mobilenet

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 199,711评论 5赞 468
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 83,932评论 2赞 376
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 146,770评论 0赞 330
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 53,799评论 1赞 271
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 62,697评论 5赞 359
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,069评论 1赞 276
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,535评论 3赞 390
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,200评论 0赞 254
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,353评论 1赞 294
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,290评论 2赞 317
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,331评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,020评论 3赞 315
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,610评论 3赞 303
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,694评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,927评论 1赞 255
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,330评论 2赞 346
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 41,904评论 2赞 341