Transfer Learning for Small-scale Financial Event Extraction

无标签

发布日期: 2021-01-13

Why？Background

事件抽取很重要，它的输出可以作为许多下游任务的输入

What？

a document-level event extraction paradigm which is composed of Event-classification and Event-Name-Entity-Recognition（Event-NER）

How?

model分为两个部分：

Event-classification model
Event-NER model for each type of event

采用的是pipeline的方式，而不是joint learning。pipeline还是joint learning一直是NLP领域中热门讨论的两种训练方式，之前一直是joint learning比较火：

不是说pipeline方式存在误差积累吗，还会增加计算复杂度（实体冗余计算）吗？

不是说pipeline方式存在交互缺失，忽略实体和关系两个任务之间的内在联系吗？

陈丹琦用pipeline刷新了SOTA：反直觉！陈丹琦用pipeline方式刷新关系抽取SOTA

Event-classification Model

Event-classification Model使用的模型是pre-trained的BERT model，然后基于task进行fine-tune。

输入：[CLS] + text + [SEP]（就是常规的BERT的输入）
输出：probability of each event type

Event-NER Model

每一种event都有自己的Event-NER Model，每一个Event-NER Model都是由一个pre-trained BERT和CRF构成。

pre-trained BERT的输入：The input is a sequence that has been classified as this type of event and pre-processed similarly as the Event-classification model.
CRF的输入：承接BERT的输出，Instead of using the representation of the first token, Event-NER model feeds the hidden representations of all the word tokens into a following CRF layer that models the transition score of this type of Event-NER.For all the tokens, CRF layer outputs a score for each event-entity and uses Viterbi algorithm to compute the optimal labelling of the whole sequence.

Transfer Learning for Event-NER

正如前面所说的，每一种event都有自己的Event-NER Model，那么这样一来，输入每个Event-NER Model的数据集就会变小。为了解决这个问题，本解决方案决定采用Transfer Learning，采用的transfer learning的方式是fine-tune Model。首先利用所有的训练数据训练一个Base-NER Model，然后将这个Base-NER Model迁移到event-specific的event-NER Model。这样做的原因在于不同类型的金融事件文本数据具有可被利用的隐式结构。

上图需要注意，Event-BERT由Base-BERT初始化，但是Event-CRF并不是由Base-CRF初始化的，而是最原始的Init-CRF，这是因为different types of event have different paradigms, therefore, have different NER labels.