A Unified MRC Framework for Named Entity Recognition

NLP

发布日期: 2021-01-27

论文：https://arxiv.org/pdf/1910.11476.pdf

源码：https://github.com/ShannonAI/mrc-for-flat-nested-ner

@article{li2019unified,
 title={A Unified MRC Framework for Named Entity Recognition},
 author={Li, Xiaoya and Feng, Jingrong and Meng, Yuxian and Han, Qinghong and Wu, Fei and Li, Jiwei},
 journal={arXiv preprint arXiv:1910.11476},
 year={2019}
}

Background

目前的NER任务分为nested NER和flat NER，两者通常是使用不同的model来解决的，很少有一个unified model来同时解决这个问题。（例如sequence labeling就只能解决flat NER）;
李宏毅老师在《人类语言处理》这门课有说，随着BERT的崛起，越来越多的NLP任务都可以使用MRC的框架来解决。

Contributions

提出了一个model：BERT-MRC，能够在这一个model里面同时解决nested NER和flat NER；
BERT-MRC引入了MRC。这种基于query的问答方式，能够encode informative prior knowledge。

Terminologies

Sequence Labeling

输入一个Sequence，输出Sequence中每个token从属的那一个类别。

Sequence Labeling的任务包括Named Entity Recognition，POS Tagging等等。

Named Entity Recognition

Traditionally，这个任务是用CRF求解的，例如之前在学习BiLSTM-CRF的时候讲到，对于一个Person实体，会有B-Person（Person的开头），I-Person（Person的中间），E-Person（Person的结束），S-Person（一个token就代表一个Person）。例如：

Carl Young is so diligent!

token Carl应该是B-Person，token Young应该是E-Person。

Machine Reading Comprehension（MRC）

这个任务很简单，就是给定一个问题，在text中找到问题的答案区间[strat,end]。

Model

Data Preprocessing

根据原有的数据集，将其构建成许多的问答三元组：$(QUESTION,ANSWER,CONTEXT)$的形式，一个例子是$(q_y,x_{start,end},X)$。

关于Question的生成有很多的方式：

Position index of labels：a query is constructed using the index of a tag to , i.e.,“one”, “two”, “three”.
Keyword：a query is the keyword describing the tag, e.g., the question query for tag ORG
is “organization”.
Rule-based template filling：generates questions using templates. The query for tag ORG is “which organization is mentioned in the text”.
Wikipedia：a query is constructed using its wikipedia definition. The query for tag ORG
is ”an organization is an entity comprising multiple people, such as an institution or an
association.”
Synonyms：are words or phrases that mean exactly or nearly the same as the original keyword
extracted using the Oxford Dictionary. The query for tag ORG is “association”.
Keyword + Synonyms：the concatenation of a keyword and its synonym.
Annotation guideline notes：is the method we use in this paper. The query for tag ORG
is ”find organizations including companies,agencies and institutions”.

Model Backbone

1. BERT

Tokenization的输入是${[CLS],q_1,q_2,…,q_m,[SEP],x_1,x_2,…,x_n}$，最后只需要$X$的representation。

$q_y$的representation不需要，它们的信息已经通过BERT的注意力机制嵌入到$X$的representation之中了。

2. Span Selection

得到$X$中每个token的representation之后，对每一个token representation都进行两个分类任务：

Task 1：预测这个token会不会是$q_y$的start index；
Task 2：预测这个token会不会是$q_y$的end index。

start index classifier

由于Task 1和Task 2是一样的方法，所以只以Task 1为例。

对于一个token，将其输入到start index classifier中，得到$P_{is \ start \ index}$和$P_{is \ not \ start \ index}$。看两个概率哪个高，这个token就是哪个标签。
$$
P_{start} = softmax_{each \ row}(E \cdot T_{start} ) \in \mathbb {R}^{n \times 2}
$$

这里使用softmax，而不是logistics~原因不详。记得区分logits和logistics的区别。

start-end matching classifier

如果是flat NER，直接对每个start最近匹配end即可，但是在nested NER就不能这么做了。

对于所有被标记成start index和end index的token，它们之间两两配对，输入到start-end matching分类器中，预测它们是否彼此相互匹配。
$$
P_{i_{start},j_{end}} = sigmoid(m \cdot concat(E_{i_{start}},E_{j_{end}}))
$$

Loss Function

我们使用

$Y_{start},Y_{end}$表示整个序列中，每个token的ground-truth labels。$P_{start},P_{end}$表示预测的结果。
$Y_{start,end}$表示整个序列中存在的span，$P_{start,end}$表示预测的结果。

可以得到三个loss function：
$$
\mathcal {L_{start}} = CE(P_{start},Y_{start}) \
\mathcal {L_{end}} = CE(P_{end},Y_{end}) \
\mathcal {L_{span}} = CE(P_{start,end},Y_{start,end}) \
$$
总的loss function为：
$$
\mathcal{L} = \alpha \mathcal{L_{start}} + \beta \mathcal{L_{end}} + \gamma \mathcal{L_{span}}
$$
其中，$\alpha,\beta,\gamma$都是超参数。

Experiment

反思

更深的阅读

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions
for machine comprehension of text. arXiv preprint arXiv:1606.05250.
ELMO
Xiaoya Li, Fan Yin, Zijun Sun, Xiayu Li, Arianna Yuan, Duo Chai, Mingxin Zhou, and Jiwei Li. 2019.
Entity-relation extraction as multi-turn question answering.In Proceedings of the 57th Conference of
the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume
1: Long Papers, pages 1340–1350.

参考资料

Bert-MRC-A Unified MRC Framework for Named Entity Recognition论文讲解

CarlYoung

http://yc1999.github.io/2021/01/27/a-unified-mrc-framework-for-named-entity-recognition/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 CarlYoung !

论文阅读 NLP MRC

🤗 Transformers

现如今最流行的NLP library——🤗 Transformers的介绍

2021-01-30 NLP

转载 NLP

What is .bashrc file in Linux?

[TOC] “graphical user interfaces make easy tasks easy, while command line interfaces make difficult tasks possible” Th

2021-01-26 Linux

Linux

A Unified MRC Framework for Named Entity Recognition

Background

Contributions

Terminologies

Sequence Labeling

Named Entity Recognition

Machine Reading Comprehension（MRC）

Model

Data Preprocessing

Model Backbone

1. BERT

2. Span Selection

start index classifier

start-end matching classifier

Loss Function

Experiment

反思

更深的阅读

参考资料

你的赏识是我前进的动力