-paper1:Matching Networks for One Shot Learning (谷歌DeepMind的一篇论文)
-paper2:DATA AUGMENTATION GENERATIVE ADVERSARIAL NET
-paper3:MetaGAN: An Adversarial Approach to Few-Shot Learning(NIPS2018)
Matching Networks for One Shot Learning
Abstract
In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.
Matching Networks基于小样本学习归类,使得训练好的模型不需要经过调整也可以用在对训练过程中未出现过的类别进行归类。
Introduction
- Deep Learning:Learning slow and based on large datasets(many weight updates using stochastic gradient descent.) This is mostly due to the parametric aspect of the model, in which training examples need to be slowly learnt by the model into its parameters。且样本用完即弃。
- Non-parametric Model:作者此处对比nearest neighbor分类器,对NN而言,样本是什么输入就是什么并且会被保存,无需训练。从而可以快速学习。
- 本文目的融合parametric model和non-parametric model。
- 本文的两个创新点:提出Matching Net(对样本学习一个样本的表示,把他们编码一下)&& 提出新的训练测试方式(task based)。
Main novelty:We propose Matching Nets (MN), a neural network which uses recent advances in attention and memory that enable rapid learning.
Model
Model Architecture
Contribution: one-shot learning within the set-to-set framework
Simplest form of model:
其中,来自support set 可以看作一个attention kernel。
该模型用函数可表示为,用概率可表示为。For a given input unseen example ,our predicted output class would be .其中P为parametric neural network。
Attention Kernel
use the softmax over the cosine distance(Euclidean distance + softmax as weight )其中f,g是两个嵌入对编码函数,如figure1所示。
Full Context Embeddings
(内部结构没太看懂,得对LSTM结构有深入对了解才行,这里是宏观上理解)
Full Context Embeddings f
Full Context Embeddings g
- 双向LSTM:学习训练集的embedding,使得每个训练样本的embedding是其它训练样本的函数;
- 基于attention-LSTM来对测试样本embedding,使得每个测试样本的embeding是训练集embedding的函数。
Training Strategy
- a task T as distribution over possible label sets L.(各种label集的可能组合)
- : a label set L sampled from a task T
- : use L to sample the support set S and a batch B
- 一个B含多个task,一个task有一个S和一个test example。对one-shot来说,support set中有且只有一个样本与test example同类。
The Matching Net is then trained to minimise the error predicting the labels in the batch B conditioned on the support set S. 换句话解释:form of meta-learning: learn to learn from a given support set to minimize a loss over batch.
DAGAN
Introduction
Figure1的意思大致为:从源域学习好的manifold可以用于实现和有效地改进匹配网络的few-shot目标域。通过DAGAN可以增加匹配网络和相关模型中的数据(从DAGAN生成的每个类的最相关的比较点来实现)这涉及切线距离的概念。DAGAN以学习到流形之间的距离为目标关键。
Figure2的意思大致介绍了data shift的概念:协变量移位对多个域之间的变化情况。(对于one shot学习,类分布有一个极端的变化——两个分布没有共同支持。因此需要假设类条件分布具有一些共性,信息才可以从源域转换到one-shot目标域,生成新的数据。)
介绍了典型的数据增强技术的思想:在数据类间转换去挖掘其中的已知不变性。引出DAGAN的思想就差不多是在不同的源域训练GAN,从而学得更大的不变空间模型。训练出来的DAGAN不依赖于类本身,能捕获跨类转换,将数据点移动到相同类的其他点。
Controbution
- Using GAN to learn a representation and process for a data
augmentation. - 用单个新数据点生成了数据增强样本。
- 在数据量少的情况下也保证了任务的泛化性。
- DAGAN在元学习空间中的应用,表现出比以往所有通用的元学习(meta-learning )模型更好的表现。
- 在元学习空间中的应用比以往所有通用的元学习(meta-learning )模型有更好的表现。
To our knowledge, this is the first paper to demonstrate state-of-the-art performance on meta-learning via novel data augmentation strategies.
Background
Transfer Learning and Dataset Shift:The term dataset shift (Storkey, 2009) generalises the concept of covariate shift (讲了协变量转移的概念)
Data Augmentation:Almost all cases of data augmentation are from a priori known invariance.(先验已知不变性)
Models
Learning
这里要强调向D提供原始数据的重要性,防止GAN简单地对当前数据点进行自动编码。
Architecture
G: a combination of a UNet and ResNet (UResNet)
normalization(the latter would break the assumptions of the WGAN objective function.)
Conclusions
- DAGANS improve performance of classifiers even after standard data-augmentation.
- 数据增强在所有模型和方法上的一般性意味着DAGAN could be a valuable addition to any low data setting.
MetaGAN
网上搜不到对这篇文章的分析,就我个人理解整篇文章偏理论,提出了把GAN应用到元学习领域。文章借用元学习训练的方式,整体来看很像半监督学习GAN。
核心思想
通过对抗训练的方式使得鉴别器 learn sharper decision boundary.
Introduction
Problem:Adapt to new tasks within a few numbers of steps and scarce data.
Solve:MetaLearning:Train a adaptation strategy to a distribution of similar tasks, trying to extract transferable patterns useful for many tasks.
目前小样本学习方法建议阅读 当小样本遇上机器学习 fewshot learning
目前许多few-shot learning models考虑如何用少量样本进行监督学习,而本文MetaGAN框架将监督和半监督学习结合,通过对抗学习的方式使用G生成的假数据学习到更清晰的决策边界,for both sample-level and task-level。
关于sharper decision boundary的理解可以参考文中的这张图:
BACKGROUND
Few-Shot Learning Def
Approch
Increase the dimension of the classifier output from N to N + 1, to model the probability that input data is fake.(通过给classifier增加一个额外的输出,这就是我说的其实想法类似于 semi-supervised GANs)
Basic Algorithm
Discriminator的选择
理论上选择是没有限制的,本文使用
- MAML:representing learning to fast fine-tune based models
- Relation Networks: learning shared embedding and metric based models
Generative的选择
Conditional generative model
WHY DOES METAGAN WORK?
最后作者分析了MetaGAN work的原因。直观的理解就是那幅图,当然作者没有那么随意,用了许多数学知识来证明,于我而言晦涩难懂,这里就不班门弄斧了。
实验
- Sample-level
- Task-level
效果都不错。