S3FD: Single Shot Scale-invariant Face Detector 【阅读笔记】

 文章链接 https://arxiv.org/abs/1708.05237


1)proposing a scale-equitable face detection frame work to handle different scales of faces well.

2)improving the recall rate of small faces by a scale compensation anchor matching strategy.

3)reducing the false positive rate of small faces via a max-out background label.



Comparing with other methods, anchor-based detection methods are more robust in complicated scenes and their speed is invariant to object numbers. However, as indicated in [12],the performance of anchor-based detectors drop dramatically as the objects becoming smaller.



1.Biased framework(不适当的网络结构)

(1)Firstly, the stride size of the lowest anchor-associated layer is too large (e.g.,8 pixels in [26] and 16 pixels in [38]), therefore small and medium faces have been highly squeezed on these layers and have few features for detection. Fig.1(a).


(2)Secondly, small face, anchor scale and receptive field are mutual mismatch: anchor scale mismatches receptive field and both are too large to fit small face.see Fig.1(b).


2. Anchor matching strategy.

those faces whose scale distribute away from anchor scales can not match enough anchors, such as tiny and outer face in Fig.1(c), leading to their low recall rate.


3. Background from small anchors.

As illustrated in Fig.1(d), these small anchors lead to a sharp increase in the number of negative anchors on the background,bringing about many false positive faces.



1.scale-equitable face detection framework

从图3(a)可以看出理想感受野比实际感受野小很多According to this theory, the anchor should be significantly smaller than theoretical receptive field in order to match the effective receptive field (see the specific example in Fig.3(b)).

As shown in the second and third column in Tab.1, the scales of our anchors are 4times its interval. We call it equal-proportion interval principle(illustrated in Fig.3(c)), which guarantees that different scales of anchor have the same density on the image, so that various scales face can approximately match the same number of anchors.

网络结构依旧沿用SSD的网络结构。因为原网络的anchor尺度设置有点大,所以作者重新设置了anchor的尺度。并且作者认为stride决定了anchor的间隔。所以设置每层stride的大小为每层anchor尺度的1/4.作者称其为equal-proportion interval principle

2.Scale compensation anchor matching strategy


Stage one:We follow current anchor matching method but decrease threshold from 0:5 to 0:35 in order to increase the average number of matched anchors.

Stage Two:After stage one, some faces still do not match enough anchors, such as tiny and outer faces marked with the gray dotted curve in Fig.4(a). We deal with each of these faces as follow:

firstly picking out anchors whose jaccard overlap with this face are higher than 0:1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.

3. Maxout background label

该方法是为了平衡负样本与正样本的比例 具体方法如下,但是没太明白

we propose to apply a more sophisticated classification strategy on the lowest layer to handle the complicated background from small anchors. We apply the max-out background label for the conv3_3 detection layer. For each of the smallest anchors, we predict Nm(Nm is the maxout background label)scores for background label and then choose the highest as its final score, as illustrated in Fig.4(b).

