In this release, we support both abstractive and extractive text summarization.
New Model: UniLM
UniLM is a state of the art model developed by Microsoft Research Asia (MSRA). The model is pre-trained on a large unlabeled natural language corpus (English Wikipedia and BookCorpus) and can be fine-tuned on different types of labeled data for various NLP tasks like text classification and abstractive summarization.
- Github: https://github.com/microsoft/unilm
- Fine-tune BERT for Extractive Summarization
Thanks to the original authors Yang Liu and Mirella Lapata for their great contribution.
All model implementations support distributed training and multi-GPU inferencing. For abstractive summarization, we also support mixed-precision training and inference.
- https: //github.com/nlpyang/PreSumm/
For more info about UniLM, please refer to the following:
- Paper: Unified Language Model Pre- training for Natural Language Understanding and Generation
Thanks to the UniLM team, Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou , Hsiao-Wuen Hon, for their great work and support for the integration.
New Model: BERTSum
BERTSum is an encoder architecture designed for text summarization. It can be used together with different decoders to support both extractive and abstractive summarization.
bert-base-uncased(extractive and abstractive)
Text Summarization with Pretrained Encoders