Extensively Matching for Few-shot Learning Event Detection

2020 ACL《用于小样本学习事件检测的广泛匹配》(Extensively Matching for Few-shot Learning Event Detection) 的阅读笔记

1. Background

  1. 目前典型的事件检测方法是基于特征工程的传统监督学习和神经网络,但是监督学习模型在处理未知类别的事件时效率较差,通常使用标注、再训练的方法,其代价较大。
  2. One potential problem of prior FSL methods is that the model relies solely on training signals
    between query instance and the support set. Thus, the matching information between samples in the support set has not been exploited yet.这句话的意思其实就是说以往的few-shot只依赖于下文中提到的$L_{query}$,没有引入$\hat {L}{intra}$和$\hat {L}{inter}$。

2. Contributions

  1. 第一次将ED问题定义为一个few-shot learning问题;

  2. 对Loss Function进行增强,引入了两种matching information:

    1. matching information between query instance and the support set;
    2. matching information between the samples in the support themselves;

    文章中把这两种matching information叫做training signals

  3. 提出的两个training signals效果显著,能够用于任何metric-based FSL models

3. Terminology

Few shot learning

In FSL, a trained model rapidly learns a new concept from a few examples while keeping great generalization from observed examples. Hence, if we need to extend event detection into a new
domain, a few examples are needed to activate the system in the new domain without retraining the
model. By formulating ED as FSL, we can significantly reduce the annotation cost and training cost
while maintaining highly accurate results.

How to do few shot learning ?

In a few shot learning iteration, the model is given a support set and a query instance. The support set consists of examples from a small set of classes. A model needs to predict the label of the query instance in accordance with the set of classes appeared in the support set.

4. Models

Event Detection as Few-shot Learning

通常FSL模型都会采用 N-way K-shot的范式预测query instance。作者在这里增加了1个新的类别$NULL$,从而变成了(N+1)-way K-shot的求解范式。

支撑集(Support Set)中添加NULL类 N+1-way K-shot,如下:

  • $(t_{1},…,t_{N})$: positive labels
  • $t_{null}$: a special label for non-event


Instance Encoder

就是对句子s里面的每个word进行word embedding。


Prototype Encoder

This module computes a representative vector, called prototype.


Classification Module



  1. Cosine similarity with averaging prototype as Matching network
  2. Euclidean distance with averaging prototype as Proto network
  3. Euclidean distance with weighted sum prototype as Proto+Att network
  4. Learnable distance function with averaging prototype as Relation network

Training Objectives


  • 最大似然估计值
    L_{query}(x,S)=-logP(y=t|x,S) \tag{1}
    where $x$,$t$,$S$ are query instance,ground true label,and support set

  • Intra-cluster matching

    L_{intra}=\sum\limits_{i=1}^{N}\sum\limits_{k=1}^{K}\sum\limits_{j=k+1}^{K}mse(v_{i}^{j},v_{i}^{k}) \tag{2}

  • Inter-cluster information

    L_{inter}=1-\sum\limits_{i=1}^{N}\sum\limits_{j=i+1}^{N}cosine(c_{i},c_{j}) \tag{3}

  • 损失函数



  • 表1显示:
    • 5+1-Way 5-shot的表现总是优于10+1-Way 10-shot,因为后者中需要被分类的类的数量是前者的2倍之多
    • Proto和Proto+Att模型的表现均最好
    • 在10+1-Way 10-shot中Proto+Att的表现略好于Proto
  • 表2显示:
    • 使用给出的损失函数后,所有的神经网络模型中F都明显提高了



