Summary of The Models


This is a summary of the models available in 🤗 Transformers. It assumes you’re familiar with the original transformer model. For a gentle introduction check the annotated transformer. Here we focus on the high-level differences between the models. You can check them more in detail in their respective documentation. Also check out the pretrained model page to see the checkpoints available for each type of model and all the community models.

Each one of the models in the library falls into one of the following categories:

Autoregressive models

Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full sentence so that the attention heads can only see what was before in the text, and not what’s after. Although those models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A typical example of such models is GPT.

Original GPT

GPT-2

CTRL

Transformer-XL

Reformer

XLNet

Autoencoding models

Autoencoding models are pretrained by corrupting the input tokens in some way and trying to reconstruct the original sentence. They correspond to the encoder of the original transformer model in the sense that they get access to the full inputs without any mask. Those models usually build a bidirectional representation of the whole sentence. They can be fine-tuned and achieve great results on many tasks such as text generation, but their most natural application is sentence classification or token classification. A typical example of such models is BERT.

Note that the only difference between autoregressive models and autoencoding models is in the way the model is pretrained. Therefore, the same architecture can be used for both autoregressive and autoencoding models. When a given model has been used for both types of pretraining, we have put it in the category corresponding to the article where it was first introduced.

BERT

ALBERT

RoBERTa

DistilBERT

XLM

XLM-RoBERTa

FlauBERT

ELECTRA

Funnel Transformer

Longformer

Sequence-to-sequence models

Sequence-to-sequence models use both the encoder and the decoder of the original transformer, either for translation tasks or by transforming other tasks to sequence-to-sequence problems. They can be fine-tuned to many tasks but their most natural applications are translation, summarization and question answering. The original transformer model is an example of such a model (only for translation), T5 is an example that can be fine-tuned on other tasks.

BART

Pegasus

MarianMT

T5

MT5

MBart

ProphetNet

XLM-ProphetNet

Multimodal models

Multimodal models mix text inputs with other kinds (e.g. images) and are more specific to a given task.

MMBT

Retrieval-based models

Some models use documents retrieval during (pre)training and inference for open-domain question answering, for example.

DPR

RAG


文章作者: CarlYoung
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 CarlYoung !
  目录