MobileNet 结构简单微调的一点性能提升

本文已经放到arxiv 上面了：
http://arxiv.org/abs/1802.03750
FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

下面是论文的一个简短的介绍：

实践过程中发现了一个对MobileNet 微调即可完成提升的方法。不过只能在140 MFLOPS 以下的小网络会有提升，例如40MFLOPS 对原版有3%的提升，12-13MFLOPS 对原版有5%的提升；相对的，对大于这个数量级的会略微变差。相比ShuffleNet 相同的计算量下的网络会略差一些，但因为ShuffleNet 比较复杂，额外层的耗时有点多，工程优化难度大，因此我们这个小发现会有那么一点点竞争力。

实验结果

第一部分，MobilenetV1，蓝色是我们复现的结果，黑色是论文中的结果。实验是在pytorch 上面完成的，imagenet 2012数据集，120 epochs 标准训练过程。我们的结果比MobileNet 论文中略高一点点。
第二部分和第三部分，分别是ShuffleNet 在两个版本论文中的结果，v1 是指单栏排版的，v2 是指双栏排版的。
第四部分，compact-Mobilenet，是我们微调结构的Mobilenet。

网络结构

这样的结构一目了然，最右边Compact-MNet 在第一次步长为2 的卷积之后并没有"逗留"，而是径直再进入一次步长为2 的卷积，如果将depthwise + pointwise 卷积看成是一个conv set 的话，那么这个结构简单说就是网络开始就进入连续三个步长为2的conv sets。后边都是按MobileNet 照猫画虎了，期间还尝试了几个类似的high-level层的微调结构，这个是最好的一个。
这个工作的思维过程说起来还是从ShuffleNet 中学习来的，简单说就是将ShuffleNet 开始的头部结构拿到了MobileNet 上进行了一次移植。
大概猜测的原因是，这样可以迅速降低特征图分辨率，降低对等结构计算量，同时保持计算量不变的结构的特征描述能力比原版的就要好一些了。

实验分析

由于该结构是对原版MobileNet 的一次微调，调整过程可以简单到修改一下特征图通道数组和步长数组即可。所以只要跑过MobileNet 的代码，那么得到compact MobileNet 的代码基本上不需要花时间，直接复现实验即可。同理，这个结果的工程实现和工程优化难度，可以MobileNet 原版一模一样，可能相比ShuffleNet 的复杂结构来讲会有一定的优势。
这块的代码月底会和我们组另外一个工作一起放出。

这个小的改动本质是一个网络结构trick，一开始连续下降两次或者三次的做法，不光在ShuffleNet上是这样的，在很多网络上也是类似的。

English Version:

FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy

Zhaoning Zhang, Qin Zheng, Xiaotao Chen
PDL, NUDT

Abstract

We present a compact MobileNets structure. It is a fine adjusted structure from the original MobileNets and performs better than the counterpart original MobileNets structure in tiny networks, such as 140 MFLOPs or less. Without the extra time consumed by the extra layers, compact MobileNets provides a competitive choice for the neural networks run on the hardwares of very limited computing power. Further, it is with very easy engineering realization and engineering optimization.

Experiment Results

First Part: Blue results are tested by our experiment with MobileNet V1. The experiment is done on pyTorch and imagenet 2012 dataset, with standard 120 epochs training.
Second and Third Part: results in two Shufflenet papers.
The last part is the results of our Compact Mobilenet.

Network Structure

The rightmost Compact-MobileNet is different with original MobileNet at the head part, compact mnet is with three continuous stride=2 convolutional sets (depthwise + pointwise conv). This structure is inspired by the head structure of ShuffleNet. The other part of the network is also fine tuned and this structure is the best in practice.
The reason why this structure performs better in tiny networks is that, as we speculated, the feature map is down sampled at very low level, with the complexity reduced fiercely in the counterpart structure, and when the complexity is restored by width modifier, the representational power surpass the original one, in the tiny structures.

Engineering

The compact structure can be reproduced by only modifying python arrays of feature map filters and strides. So, it is with very easy engineering realization and engineering optimization, as the same as MobileNets. The code will be available soon with the other work of our team.

Google在2018年1月16号放出来的MobilenetV2 结构 https://arxiv.org/abs/1801.04381 ，我这边已经复现过了，可以达到论文所说的精度，文中还有大量的细节放出，确实是一篇良心论文。引用知乎问题：如何评价mobilenet v2 ?

发展实在是太快了，因为我自己这个工作本就是抖机灵的，就在这里放一篇非正式的算了。
^v^