随手记
A Comprehensive Survey on Schema-based Event Extraction with Deep Learning
一些概念
-
Entity: The entity is an object or group of objects in a semantic category. Entity mainly includes people,
organizations, places, times, things, etc. -
Event mentions: The phrase or sentences that describe the event contains a trigger and corresponding
arguments. -
Event type: The event type describes the nature of the event and refers to the category to which the
event corresponds, usually represented by the type of the event trigger. - Event trigger: Event trigger refers to the core unit in event extraction, a verb or a noun. Trigger identification is a key step in pipeline-based event extraction.
- Event argument: Event argument is the main attribute of events. It includes entities, nonentity participants, and time, and so on.
- Argument role: An argument role is a role played by an argument in an event, that is, the relationship representation between the event arguments and the event triggers.
schema-based EE可以包括以下子任务
-
Event classification: Event classification is to determine whether each sentence is an event. Furthermore,
if the sentence is an event, we need to determine one or several events types the sentence belongs to. Therefore, the event classification subtask can be seen as a multi-label text classification task. -
Trigger identification: It is generally considered that the trigger is the core unit in event extraction
that can clearly express an event’s occurrence. The trigger identification subtask it to find the trigger
from the text. -
Argument identification: Argument identification is to identify all the arguments contained in
an event type from the text. Argument identification usually depends on the result of event classification
and trigger identification. - Argument role classification: Argument role classification is based on the arguments contained in the event extraction schema, and the category of each argument is classified according to the identified arguments. Thus, it also can be seen as a multi-label text classification task.
EE可以视为以下任务
- classification task: 预定义
n
个事件类型和它们对应的角色,例如:event 包含了一个角色集合。给定一个事件表述m
,需要输出向量T
,表示m
属于event 的概率,一般来说当作多标签分类任务来做。 - sequence tagging: 就是role+text进行NER
- MRC: 定义question schema->提取论元角色->确定事件类型->根据Question进行MRC
EE模型
- 传统提取模型
- DL
- CNN-based
- RNN-based
- Attention-based
- GCN-based
- Transformer-based
研究难点
- 事件论点之间的依赖关系提取
- 大规模数据集少
- few shot
- 通用的事件schema比较匮乏,现在都是手工构建
- 专业领域较难