GRU算法出自这篇文章:"Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation"。这里介绍下这篇文章的主要贡献。
RNN Encoder–Decoder
Hidden Unit that Adaptively Remembers and Forgets
然后,该文章提出了一种自适应记忆和忘记的结构。该结构的主要思想是为每个unit设计记忆和忘记的机制,从而学习到长短期的特征。对于短期记忆单元,reset gate就会频繁的激活;对长期记忆单元,update gate会经常激活。
As each hidden unit has separate reset and update gates, each hidden unit will learn to capture dependencies over different time scales. Those units that learn to capture short-term dependencies will tend to have reset gates that are frequently active, but those that capture longer-term dependencies will have update gates that are mostly active.
- 在保留基本思想(遗忘和更新机制)的基础上,简化了网络结构。
- 利用update门使每个单元学习长短期特征,减小了梯度弥散的风险。