论文链接:http://pdfs.semanticscholar.org/b28f/7e2996b6ee2784dd2dbb8212cfa0c79ba9e7.pdf
本文提出一个基于深度记忆网络的aspect level sentiment分类,不像feature-based SVM和序列神经网络LSTM,本文的方法可以提取每个上下文词的重要度,用于预测一个aspect的sentiment polarity。用多个计算层和一个外部memory。
Introduction:
Given a sentence and an aspect occurring in the sentence, this task aims at inferring the sentiment polarity(e.g. positive, negative, neutral) of the aspect.
本文的方法是data-driven的,不依赖于句法分析器和情感词典。本文的方法包括多个计算层,参数共享。
Each layer is a content- and location- based attention model, which first learns the importance/weight of each context word and then utilizes this information of each context word and then utilizes this information to calculate continuous text representation.
最后一层的text representation被看做用于情感分类的特征。
Deep memory network for aspect level sentiment classification:
1. Task definition and notation:
给定一个句子s = {w1, w2, ..., wi, ..., wn}包括n个单词和一个aspect word wi1,这个wi1出现在句子s中。aspect level的情感分类希望确定the sentiment polarity of sentence s towards the aspect wi。(在实际中,会有多个单词的aspect,例如“battery life”,为了简化问题本文将aspect定义为单个单词)
2. An overview of the approach
词向量被分为两部分,aspect representation和context representation,如果aspect是单个词例如"food"或"service",aspect representation是aspect词的embedding。多个词的aspect例如“battery”,aspect representation是平均值。本文只用单个词的情况。
上下文的vectors是{e1, e2, ..., ei-1, ei+1, ..., en} 堆叠起来,看作是外部memory,n是句子长度
本文的方法包括多个layers(hops),each of which contains an attention layer and a linear layer.
在第一个computational layer(hop)中,将aspect作为输入,根据attention层从memory m中选择重要信息。attention层的输出和aspect向量的变换求和,结果作为下一层的输入,最后一个hop的输出作为representation of sentence with regard to the aspect。attention和linear layers的权值共享。
3. Content attention
The basic idea of attention mechanism is that it assigns a weight/importance to each lower position when computing an upper level representation
输入:external memory m属于Rdxk,和一个aspect vector vaspect属于Rdx1。输出是每个memory的加权和。
k是memory的大小,ai属于[0,1],是mi的权重,ai的总和为1。对于每个memory mi,我们使用一个前向神经网络来计算mi与aspect的语义关联。打分函数通过如下计算,
得到了{g1, g2, ..., gk},使用softmax计算importance score
attention模型有两个优点,一个是可以增加importance score,根据mi与aspect的语义关联。另一个优点是attention模型是differentiable(可微的?),用端对端的方式来训练。
4. Location Attention
Model1:来自(End-to-End memory network)
vi属于Rdx1,是wi的location vector。
n是句子长度,k是hop number, li是wi的location
Model2:Model1的简化版本,在不同的hop使用同样的location vector
Model3:
将location vector看做是参数,All the position vectors are stacked in a position embedding matrix, which is jointly learned with gradient descent.
Model4:
location vector看作是参数,location representation被看做是一个neural gates,来控制有多少部分的单词语义要写到memory中,使用sigmoid函数计算,mi是一个element-wise multiplication。
5. The need for multiple hops
It is widely accepted that computational models that are composed of multiple processing layers have the ability to learn representations of data with multiple levels of abstraction
T是所有的训练实例,C是sentiment categories
(s, a)代表了 sentence-aspect pair
Pc(s, a)是预测(s,a)的分类为c的概率,Pcg(s, a)是1或者0,代表正确答案是否是c