Extensively Matching for Few-shot Learning Event Detection


Extensively Matching for Few-shot Learning Event Detection

2020 ACL《用于小样本学习事件检测的广泛匹配》(Extensively Matching for Few-shot Learning Event Detection) 的阅读笔记

1. Background

  1. 目前典型的事件检测方法是基于特征工程的传统监督学习和神经网络,但是监督学习模型在处理未知类别的事件时效率较差,通常使用标注、再训练的方法,其代价较大。
  2. One potential problem of prior FSL methods is that the model relies solely on training signals
    between query instance and the support set. Thus, the matching information between samples in the support set has not been exploited yet.这句话的意思其实就是说以往的few-shot只依赖于下文中提到的$L_{query}$,没有引入$\hat {L}{intra}$和$\hat {L}{inter}$。

2. Contributions

  1. 第一次将ED问题定义为一个few-shot learning问题;

  2. 对Loss Function进行增强,引入了两种matching information:

    1. matching information between query instance and the support set;
    2. matching information between the samples in the support themselves;

    文章中把这两种matching information叫做training signals

  3. 提出的两个training signals效果显著,能够用于任何metric-based FSL models

3. Terminology

Few shot learning

In FSL, a trained model rapidly learns a new concept from a few examples while keeping great generalization from observed examples. Hence, if we need to extend event detection into a new
domain, a few examples are needed to activate the system in the new domain without retraining the
model. By formulating ED as FSL, we can significantly reduce the annotation cost and training cost
while maintaining highly accurate results.

How to do few shot learning ?

In a few shot learning iteration, the model is given a support set and a query instance. The support set consists of examples from a small set of classes. A model needs to predict the label of the query instance in accordance with the set of classes appeared in the support set.

4. Models

Event Detection as Few-shot Learning

通常FSL模型都会采用 N-way K-shot的范式预测query instance。作者在这里增加了1个新的类别$NULL$,从而变成了(N+1)-way K-shot的求解范式。

支撑集(Support Set)中添加NULL类 N+1-way K-shot,如下:

  • $(t_{1},…,t_{N})$: positive labels
  • $t_{null}$: a special label for non-event

Framework

Instance Encoder

就是对句子s里面的每个word进行word embedding。

随后对整个句子s,采用一些神经网络(CNN,LSTM,GCN),从句子s的word中,得到句子s的表示。

Prototype Encoder

This module computes a representative vector, called prototype.

有两种方法获得,一种方法是暴力平均,另外一种方法是加权平均(需要用到注意力机制)。

Classification Module

计算公式如下:

其中的函数$d()$可以有四种选择:

  1. Cosine similarity with averaging prototype as Matching network
  2. Euclidean distance with averaging prototype as Proto network
  3. Euclidean distance with weighted sum prototype as Proto+Att network
  4. Learnable distance function with averaging prototype as Relation network

Training Objectives

利用查询实例和支撑集之间支撑集内样本自身之间的匹配信息来训练ED模型,可以显著减少标注和训练代价,同时维持高准确率。具体的方法是通过在损失函数中添加辅助参数来抑制学习过程。

  • 最大似然估计值
    $$
    L_{query}(x,S)=-logP(y=t|x,S) \tag{1}
    $$
    where $x$,$t$,$S$ are query instance,ground true label,and support set

  • Intra-cluster matching

    相同类之间的向量是类似的,因此最小化它们的间距
    $$
    L_{intra}=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}\sum\limits_{j=k+1}^{K}mse(v_{i}^{j},v_{i}^{k}) \tag{2}
    $$

  • Inter-cluster information

    最大化类之间的距离
    $$
    L_{inter}=1-\sum\limits_{i=1}^{N}\sum\limits_{j=i+1}^{N}cosine(c_{i},c_{j}) \tag{3}
    $$

  • 损失函数

    由(1)、(2)、(3)

三、实验

  • 表1显示:
    • 5+1-Way 5-shot的表现总是优于10+1-Way 10-shot,因为后者中需要被分类的类的数量是前者的2倍之多
    • Proto和Proto+Att模型的表现均最好
    • 在10+1-Way 10-shot中Proto+Att的表现略好于Proto
  • 表2显示:
    • 使用给出的损失函数后,所有的神经网络模型中F都明显提高了

四、消融实验

上表显示了各个模型中未加入损失函数、只加入Inter、只加入Intra和同时加入Inter和Intra的结果,表明这两个损失函数对于结果都有明显的提升,且缺失任何一个,都会对结果造成较大精度损失。


文章作者: CarlYoung
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 CarlYoung !
 上一篇
Focal Loss——从直觉到实现 Focal Loss——从直觉到实现
Focal Loss——从直觉到实现问题(Why?)做机器学习分类问题,难免遇到Biased-Data-Problem,例如 CV的目标检测问题: 绝大多数检测框里都是 backgroud NLP的异常文本检测: 绝大多数文本都是 nor
2021-03-29
下一篇 
刷爆3路榜单,信息抽取冠军方案分享:嵌套NER+关系抽取+实体标准化 刷爆3路榜单,信息抽取冠军方案分享:嵌套NER+关系抽取+实体标准化
本篇文章,JayJay并不想只是围绕竞赛本身谈策略,而是想和大家一起交流:无论在竞赛还是落地中,信息抽取任务的稳定提升策略有哪些?总的来看就是两点: 构建一个强大的baseline,这取决于标注框架的选择; 套路化的辅助策略,稳定迭代并提
2021-03-25
  目录