Triplet Network, Triplet Loss及其tensorflow实现

本文译自Olivier Moindrot的[blog](Triplet Loss and Online Triplet Mining in TensorFlow)，英语好的可移步至其博客。

我们在之前的文章里介绍了Siamese network 孪生神经网络--一个简单神奇的结构,也介绍一下triplet network基本结构，本文将介绍一下triplet network中triplet loss一些有趣的地方。

1.前言

在人脸识别领域，triplet loss通常用来学习人脸的向量表示。如果您对triplet loss不太了解推荐观看Andrew Ng在Coursera上的deep learning specialization。

Triplet loss难于实现，本文将介绍triplet loss的定义以及triplet训练时的策略。为什么要有训练策略？所有的triplet组合太多了，都要训练太inefficient，所以要挑一些比较好的triplet进行训练，高效&效果好。

2. Triplet loss 和 triplet mining

2.1 为什么不用softmax，而使用triplet loss?

Triplet loss最早被用在人脸识别任务上，《FaceNet: A Unified Embedding for Face Recognition》 by Google。Google的研究人员提出了通过online triplet mining的方式训练处人脸的新向量表示。接下来我们会详细讨论。

在有监督的机器学习领域，通常有固定的类别，这时就可以使用基于softmax的交叉熵损失函数进行训练。但有时，类别是一个变量，此时使用triplet loss就能解决问题。在人脸识别，Quora question pair任务中，triplet loss的优势在于细节区分，即当两个输入相似时，triplet loss能够更好地对细节进行建模，相当于加入了两个输入差异性差异的度量，学习到输入的更好表示，从而在上述两个任务中有出色的表现。当然，triplet loss的缺点在于其收敛速度慢，有时不收敛。

Triplet loss的motivation是要让属于同一个人的人脸尽可能地“近”（在embedding空间里），而与其他人脸尽可能地“远”。

2.2 Triplet loss 定义

Triplet loss 在 positive faces (Obama) 和 negative face (Macron)上的示意图

triplet loss的目标是:

两个具有同样标签的样本，他们在新的编码空间里距离很近。

两个具有不同标签的样本，他们在新的编码空间里距离很远。

进一步，我们希望两个positive examples和一个negative example中，negative example与positive example的距离，大于positive examples之间的距离，或者大于某一个阈值：margin。

2.3 triplet loss定义在下面三元组概念之上：

- an anchor(基准正例)

- a positive of the same class as the anchor （正例）

- a negative of a different class （负例）

对于（a,p,n）这个triplet(三元组)，其triplet loss就可以写作：

这时可以通过最小化上述损失函数，a与p之间的距离d(a,p)=0，而a与n之间的距离d(a,n)大于d(a,p)+margin。当negative example很好识别时，上述损失函数为0，否则是一个比较大的值。

3. Triplet mining

基于triplet loss的定义，可以将triplet(三元组)分为三类：

easy triplets(简单三元组): triplet对应的损失为0的三元组，形式化定义为$d(a,n)>d(a,p)+margin$。

hard triplets（困难三元组）: negative example 与anchor距离小于anchor与positive example的距离，形式化定义为$d(a,n)

semi-hard triplets（一般三元组）: negative example 与anchor距离大于anchor与positive example的距离，但还不至于使得loss为0，即$d(a,p)

上述三种概念都是基于negative example与anchor和positive距离定义的。类似的，可以根据上述定义将negative examples分为3类：hard negatives, easy negatives, semi-hard negatives。如下图所示，这个图构建了编码空间中三种negative examples与anchor和positive example之间的距离关系。

三种negative examples与anchor和positive example之间的距离关系

如何选择triplet或者negative examples，对模型的效率有很大影响。在上述Facenet论文中，采用了随机的semi-hard negative构建triplet进行训练，取得了不错的效果。

3.1 Offline和online triplet mining

通过上面的分析，可以看到，easy negative example比较容易识别，没必要构建太多由easy negative example组成的triplet，否则会严重降低训练效率。若都采用hard negative example，又可能会影响训练效果。这时，就需要一定的方法进行triplet的挑选，也就是“mine the triplets”。

3.1.1 Offline triplet mining

离线方式的triplet mining将所有的训练数据喂给神经网络，得到每一个训练样本的编码，根据编码计算得到negative example与anchor和positive example之间的距离，根据这个距离判断semi-hard triplets，hard triplets还是easy triplets。offline triplet mining 仅仅选择select hard or semi-hard triplets，因为easy triplet太容易了，没有必要训练。

总得来说，这个方法不够高效，因为最初要把所有的训练数据喂给神经网络，而且每过1个或几个epoch，可能还要重新对negative examples进行分类。

3.1.2 Online triplet mining

Google的研究人员为解决上述问题，提出了online triplet mining的方法。该方法的motivation比较简单，将B张图片（一个batch）喂给神经网络，得到B张图片的embedding，将triplet的组合一共最多$B^3$个triplets，其中包含很多没用的triplet（比如，三个negative examples和三个positive examples，这种称作invalid triplets）。哪些是valid triplets呢？假设一个triplet$(B_i,B_j,B_k)$，如果样本i和j有相同的label且不是同一个样本，而样本k具有不同的label，则称其为valid triplet。

假设一个batch的数据包含P*K张人脸，P个人，每人K张图片。

针对valid triplet的“挑选”，有以下两个策略（来自论文[1703.07737] In Defense of the Triplet Loss for Person Re-Identification)：

- batch all: 计算所有的valid triplet，对6hard 和 semi-hard triplets上的loss进行平均。

- 不考虑easy triplets，因为easy triplets的损失为0，平均会把整体损失缩小

- 将会产生PK(K-1)(PK-K)个triplet，即PK个anchor，对于每个anchor有k-1个可能的positive example，PK-K个可能的negative examples

- batch hard: 对于每一个anchor，选择hardest positive example(距离anchor最大的positive example)和hardest negative(距离anchor最小的negative example)，

- 由此产生PK个triplet

- 这些triplet是最难分的

Online triplet loss

论文[《In Defense of the Triplet Loss for Person Re-Identification》]([1703.07737] In Defense of the Triplet Loss for Person Re-Identification)实验结果表明，batch hard的表现是最好的。

4. 那如何用tensorflow实现triplet loss呢？

4.1 offline triplets

很简单，就是实现上面offline triplets的公式，tensorflow的实现如下：

anchor_output = ... # shape [None, 128]positive_output = ... # shape [None, 128]negative_output = ... # shape [None, 128]d_pos = tf.reduce_sum(tf.square(anchor_output - positive_output), 1)d_neg = tf.reduce_sum(tf.square(anchor_output - negative_output), 1)loss = tf.maximum(0.0, margin + d_pos - d_neg)loss = tf.reduce_mean(loss)

4.2 online triplets

4.2.1 batch all的实现方式

def batch_all_triplet_loss(labels, embeddings, margin, squared=False):"""Build the triplet loss over a batch of embeddings.We generate all the valid triplets and average the loss over the positive ones.Args: labels: labels of the batch, of size (batch_size,) embeddings: tensor of shape (batch_size, embed_dim) margin: margin for triplet loss squared: Boolean. If true, output is the pairwise squared euclidean distance matrix. If false, output is the pairwise euclidean distance matrix.Returns: triplet_loss: scalar tensor containing the triplet loss"""# Get the pairwise distance matrixpairwise_dist = _pairwise_distances(embeddings, squared=squared)anchor_positive_dist = tf.expand_dims(pairwise_dist, 2)anchor_negative_dist = tf.expand_dims(pairwise_dist, 1)# Compute a 3D tensor of size (batch_size, batch_size, batch_size)# triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k# Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)# and the 2nd (batch_size, 1, batch_size)triplet_loss = anchor_positive_dist - anchor_negative_dist + margin# Put to zero the invalid triplets# (where label(a) != label(p) or label(n) == label(a) or a == p)mask = _get_triplet_mask(labels)mask = tf.to_float(mask)triplet_loss = tf.multiply(mask, triplet_loss)# Remove negative losses (i.e. the easy triplets)triplet_loss = tf.maximum(triplet_loss, 0.0)# Count number of positive triplets (where triplet_loss > 0)valid_triplets = tf.to_float(tf.greater(triplet_loss, 1e-16))num_positive_triplets = tf.reduce_sum(valid_triplets)num_valid_triplets = tf.reduce_sum(mask)fraction_positive_triplets = num_positive_triplets / (num_valid_triplets + 1e-16)# Get final mean triplet loss over the positive valid tripletstriplet_loss = tf.reduce_sum(triplet_loss) / (num_positive_triplets + 1e-16)return triplet_loss, fraction_positive_triplets

4.2.2 batch hard的实现方式

def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):"""Build the triplet loss over a batch of embeddings.For each anchor, we get the hardest positive and hardest negative to form a triplet.Args: labels: labels of the batch, of size (batch_size,) embeddings: tensor of shape (batch_size, embed_dim) margin: margin for triplet loss squared: Boolean. If true, output is the pairwise squared euclidean distance matrix. If false, output is the pairwise euclidean distance matrix.Returns: triplet_loss: scalar tensor containing the triplet loss"""# Get the pairwise distance matrixpairwise_dist = _pairwise_distances(embeddings, squared=squared)# For each anchor, get the hardest positive# First, we need to get a mask for every valid positive (they should have same label)mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)mask_anchor_positive = tf.to_float(mask_anchor_positive)# We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)# shape (batch_size, 1)hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)# For each anchor, get the hardest negative# First, we need to get a mask for every valid negative (they should have different labels)mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)mask_anchor_negative = tf.to_float(mask_anchor_negative)# We add the maximum value in each row to the invalid negatives (label(a) == label(n))max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)# shape (batch_size,)hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)# Combine biggest d(a, p) and smallest d(a, n) into final triplet losstriplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)# Get final mean triplet losstriplet_loss = tf.reduce_mean(triplet_loss)return triplet_loss

在minist等数据集上的效果都是棒棒哒。

5. 总结

triplet loss的实现不是很简单，比较tricky的地方是如何计算embedding的距离，以及怎样识别并抛弃掉invalid和easy triplet。当然，如果您使用的是tensorflow，可以直接移步至[github repository](omoindrot/tensorflow-triplet-loss)，有一份写好的triplet loss在等着你。。。

可能有人会有疑惑，siamese network, triplet network的输入都是成对的，或者triplet的三元组，怎么对一个样本进行分类啊？神经网络的优势在于表示学习，自动的特征提取，所以，成对或者triplet的输入能让神经网络学到输入的更好的表示，后面再接svm, logtistic regression就可以啦。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,839评论 6赞 482
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,543评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 153,116评论 0赞 344
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,371评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,384评论 5赞 374
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,111评论 1赞 285
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,416评论 3赞 400
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 37,053评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,558评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 36,007评论 2赞 325
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,117评论 1赞 334
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,756评论 4赞 324
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,324评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,315评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,539评论 1赞 262
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,578评论 2赞 355
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,877评论 2赞 345

Triplet Network, Triplet Loss及其tensorflow实现

推荐阅读更多精彩内容