文章来源:IJCAI-16
based on unsupervised learning of distributed representations of words and dependency paths.
基本idea:在依存空间中,通过依存路径连接两个词
在低维空间优化了w1+r约等于w2,而多条依存路径被当做a sequence of grammatical relations and modeled by a recurrent neural network。考虑线性上下文和依存上下文信息的embedding features,基于CRF的aspect term extraction。
结果:1) 在单embedding features的情况下,有好的结果 2) 在word yields增加句法信息(syntactic information)有更好的表现。
主流的方法:1) The unsupervised(or rule based) methods rely on a set of manually defined opinion words as seeds and rules derived from syntactic parsing trees to iteratively extract aspect terms. 无监督方法,依赖手动定义的opinion词和通过句法树学习的规则。 2)The supervised methods将ATE问题看做a sequence labeling problem,并且conditional random field(CRF)是主流的方法。
representation learning:1) word embeddings 2) structured embeddings of knowledge bases
本文: focus on representation learning for aspect term extraction under an unsupervised framework. 通过学习distributed representations of words and dependency paths from the text corpus.
The learned embeddings of words and dependency paths are utilized as features in CRF for aspect term extraction.
问题:The embeddings are real values that are not necessarily in a bounded range.
本文:首先map the continuous embeddings into the discrete embeddings and make them more appropriate for the CRF model.将连续的embeddings map到分离的embeddings。 然后,构建embeddings features包括the target word embeddings,线性上下文embedding和dependency context embedding for aspect term extraction。
Related Work:
无监督学习:关联规则挖掘association rule mining,除此之外,使用opinion words来提取不频繁的aspect terms。 dependency relation is used as crucial clue,double propagation method双传输方法可以迭代的提取aspect terms和opinion words。
监督学习:主流方法还是CRF。Li et al.[2010]提出了一个新的在CRF上的机器学习框架,结合extract positive opinion words,negative opinion words和Aspect terms。
dependency paths:包含丰富的词语间的语言信息
本文:learn the semantic composition of dependency paths over dependency trees.
Method:
首先从dependency trees提取triple(w1, w2, r),w1和w2是两个词,the corresponding dependency path r是从w1到w2的最短路径并且包括a sequence of grammatical relations.
We notice that considering the lexicalized dependency paths can provide more information for the embedding learning.但是,需要记住更多的dependency path frequencies for the learning method(负采样)。dependency paths是(考虑n-hop dependency paths)
|Vword|是words集的个数,大于十万个,Vdep是语法关系集,|Vdep|大约是50
损失函数:
C1表示从dependency trees提取的三元组,dependency trees从text corpus提取,r是a sequence of grammatical relations,(g1, g2, ..., gn),n是r的hop number,gi是r中第i个语法关系,并且p(r)是r的边缘分布。损失函数确保三元组(w1, w2, r)有更高的排序分数,比随机挑选的三元组(w1, w2, r')。ranking score衡量:inner product of vector r/r' 和 vector w2-w1。
让Recurrent neural network学习the compositional representations(组合表示) for multi-hop dependency paths. 组合运算通过矩阵W实现:
f是一个hard hyperbolic tangent function(hTanh), [a;b]是一个两个向量的连接,gi是gi的embedding。设置h1=g1然后迭代composition operation得到最后的r=hn。hop number是小于等于3的,因为设置更大会很费时间。
Multi-task learning with linear context:
线性上下文,基于distributional hypothesis分布假设,假设在相似上下文的词有相似的意义。inspired by Skip-gram,enhance word embeddings 通过最大化prediction accuracy of context word c that occurs in the linear context of a target word w。每个词有两种角色,the target word and the context word of other target words.
模型训练:
负采样用于训练embedding model
Aspect Term Extraction with Embeddings:
CRF