Transfer Learning for Small-scale Financial Event Extraction




a document-level event extraction paradigm which is composed of Event-classification and Event-Name-Entity-Recognition(Event-NER)



  1. Event-classification model
  2. Event-NER model for each type of event

采用的是pipeline的方式,而不是joint learning。pipeline还是joint learning一直是NLP领域中热门讨论的两种训练方式,之前一直是joint learning比较火:




Event-classification Model

Event-classification Model使用的模型是pre-trained的BERT model,然后基于task进行fine-tune。

  • 输入:[CLS] + text + [SEP](就是常规的BERT的输入)

  • 输出:probability of each event type

Event-NER Model

每一种event都有自己的Event-NER Model,每一个Event-NER Model都是由一个pre-trained BERT和CRF构成。

  • pre-trained BERT的输入:The input is a sequence that has been classified as this type of event and pre-processed similarly as the Event-classification model.

  • CRF的输入:承接BERT的输出,Instead of using the representation of the first token, Event-NER model feeds the hidden representations of all the word tokens into a following CRF layer that models the transition score of this type of Event-NER.For all the tokens, CRF layer outputs a score for each event-entity and uses Viterbi algorithm to compute the optimal labelling of the whole sequence.

Transfer Learning for Event-NER

正如前面所说的,每一种event都有自己的Event-NER Model,那么这样一来,输入每个Event-NER Model的数据集就会变小。为了解决这个问题,本解决方案决定采用Transfer Learning,采用的transfer learning的方式是fine-tune Model。首先利用所有的训练数据训练一个Base-NER Model,然后将这个Base-NER Model迁移到event-specific的event-NER Model。这样做的原因在于不同类型的金融事件文本数据具有可被利用的隐式结构。

上图需要注意,Event-BERT由Base-BERT初始化,但是Event-CRF并不是由Base-CRF初始化的,而是最原始的Init-CRF,这是因为different types of event have different paradigms, therefore, have different NER labels.

文章作者: CarlYoung
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 CarlYoung !