Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably
the largest impact on neural machine translation.
Recently, similar improvements have been obtained using alternative mechanisms that do not focus on a single part of a memory but operate on all of it in parallel, in a uniform way. Such mechanism, which we call active memory, improved over attention in algorithmic tasks, image processing, and in generative modelling.
So far, however, active memory has not improved over attention for most natural language processing tasks, in particular for machine translation. We analyze this shortcoming in this paper and propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice.
注意力机制是一项把神经网络的注意力聚焦在输入或者记忆的选出来的那部分上的技术,在深度学习中取得了很好的效果。对图像分类,图像描述,语音识别,生成模型和学习算法任务都有一定的性能提升效果,在神经网络机器翻译中的提升效果最为明显。
现在类似的提升可以通过另外一种机制达到,这种方法并不会聚焦于记忆的一个单独的部分,而是对记忆所有的部分依照一种统一的方式并行操作。这里我们称之为主动记忆(active memory),这在算法任务,图像处理和生成式建模上得到了比注意力机制更好的效果。
而现在,主动记忆并没有对大多数的自然语言处理任务有效果提升,尤其是机器翻译问题。我们分析了这个原因,并提供了一种扩展的主动记忆的模型,得到了与神经网络机器翻译的相当的效果,并且对更长的句子有更好的泛化效果。所以对此模型进行了探究并解释了为何早先的主动记忆模型失败的原因。最后,探讨了主动记忆技术带来的好处,也指出那些注意力机制优越的应用。