Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Overview

EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Requirements

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

pip install -r requirements.txt

Download checkpoints

Download the vocabulary file of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the pre-trained checkpoint of BERT-base (uncased) from HERE, and put it into ./pretrained_ckpt/.
Download the 2nd general distillation checkpoint of TinyBERT from HERE, and extract them into ./pretrained_ckpt/.

Prepare dataset

Download the latest dump of Wikipedia from HERE, and extract it into ./dataset/pretrain_data/download_wikipedia/.
Download a mirror of BooksCorpus from HERE, and extract it into ./dataset/pretrain_data/download_bookcorpus/.

- Pre-training data

bash create_pretrain_data.sh
bash create_pretrain_feature.sh

The features of Wikipedia, BooksCorpus, and their concatenation will be saved into ./dataset/pretrain_data/wikipedia_nomask/, ./dataset/pretrain_data/bookcorpus_nomask/, and ./dataset/pretrain_data/wiki_book_nomask/, respectively.

- Fine-tuning data

Download the GLUE dataset using the script in HERE, and put the files into ./dataset/glue/.
Download the SQuAD v1.1 and v2.0 datasets from the following links:

and put them into ./dataset/squad/.

Pre-train the supernet

bash pretrain_supernet.sh

The checkpoints will be saved into ./exp/pretrain/supernet/, and the names of the sub-directories should be modified into stage1_2 and stage3 correspondingly.

We also provide the checkpoint of the supernet in stage 3 (pre-trained with both Wikipedia and BooksCorpus) at HERE.

Train the teacher model (BERT$_{\rm BASE}$)

bash train.sh

The checkpoints will be saved into ./exp/train/bert_base/, and the names of the sub-directories should be modified into the corresponding task name (i.e., mnli, qqp, qnli, sst-2, cola, sts-b, mrpc, rte, wnli, squad1.1, and squad2.0). Each sub-directory contains a checkpoint named best_model.bin.

Conduct NAS (including search stage 1, 2, and 3)

bash ffn_search.sh

The checkpoints will be saved into ./exp/ffn_search/.

Distill the student model

- TinyBERT$_4$, TinyBERT$_6$

bash finetune.sh

The checkpoints will be saved into ./exp/downstream/tiny_bert/.

- EfficientBERT$_{\rm TINY}$, EfficientBERT, EfficientBERT+, EfficientBERT++

bash nas_finetune.sh

The above script will first pre-train the student models based on the pre-trained checkpoint of the supernet in stage 3, and save the pre-trained checkpoints into ./exp/pretrain/auto_bert/. Then fine-tune it on the downstream datasets, and save the fine-tuned checkpoints into ./exp/downstream/auto_bert/.

We also provide the pre-trained checkpoints of the student models (including EfficientBERT$_{\rm TINY}$, EfficientBERT, and EfficientBERT++) at HERE.

- EfficientBERT (TinyBERT$_6$)

bash nas_finetune_transfer.sh

The pre-trained and fine-tuned checkpoints will be saved into ./exp/pretrain/auto_tiny_bert/ and ./exp/downstream/auto_tiny_bert/, respectively.

Test on the GLUE dataset

bash test.sh

The test results will be saved into ./test_results/.

Reference

If you find this code helpful for your research, please cite the following paper.

@inproceedings{dong2021efficient-bert,
  title     = {{E}fficient{BERT}: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation},
  author    = {Chenhe Dong and Guangrun Wang and Hang Xu and Jiefeng Peng and Xiaozhe Ren and Xiaodan Liang},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2021},
  year      = {2021}
}
Owner
Chenhe Dong
Chenhe Dong
The first online catalogue for Arabic NLP datasets.

Masader The first online catalogue for Arabic NLP datasets. This catalogue contains 200 datasets with more than 25 metadata annotations for each datas

ARBML 94 Dec 26, 2022
A machine learning model for analyzing text for user sentiment and determine whether its a positive, neutral, or negative review.

Sentiment Analysis on Yelp's Dataset Author: Roberto Sanchez, Talent Path: D1 Group Docker Deployment: Deployment of this application can be found her

Roberto Sanchez 0 Aug 04, 2021
Google's Meena transformer chatbot implementation

Here's my attempt at recreating Meena, a state of the art chatbot developed by Google Research and described in the paper Towards a Human-like Open-Domain Chatbot.

Francesco Pham 94 Dec 25, 2022
Code for the Python code smells video on the ArjanCodes channel.

7 Python code smells This repository contains the code for the Python code smells video on the ArjanCodes channel (watch the video here). The example

55 Dec 29, 2022
texlive expressions for documents

tex2nix Generate Texlive environment containing all dependencies for your document rather than downloading gigabytes of texlive packages. Installation

Jörg Thalheim 70 Dec 26, 2022
An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

Koniwa project 32 Dec 14, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
Bidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.

anaGo anaGo is a Python library for sequence labeling(NER, PoS Tagging,...), implemented in Keras. anaGo can solve sequence labeling tasks such as nam

Hiroki Nakayama 1.5k Dec 05, 2022
PyTorch source code of NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models"

This repository contains source code for NAACL 2019 paper "An Embarrassingly Simple Approach for Transfer Learning from Pretrained Language Models" (P

Alexandra Chronopoulou 89 Aug 12, 2022
Athena is an open-source implementation of end-to-end speech processing engine.

Athena is an open-source implementation of end-to-end speech processing engine. Our vision is to empower both industrial application and academic research on end-to-end models for speech processing.

Ke Technologies 34 Sep 08, 2022
A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Transformer Embedder A Word Level Transformer layer based on PyTorch and 🤗 Transformers. How to use Install the library from PyPI: pip install transf

Riccardo Orlando 27 Nov 20, 2022
official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

official ( API ) for the zAmericanEnglish app in [ Google play ] and [ App store ]

Plugin 3 Jan 12, 2022
Coreference resolution for English, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, msg systems ag 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 German 1.2.3 Polish 1

msg systems ag 169 Dec 21, 2022
Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer

IMDB Sentiment Analysis This is the final project of Machine Learning Courses in Huazhong University of Science and Technology, School of Artificial I

Daniel 0 Dec 27, 2021
wxPython app for converting encodings, modifying and fixing SRT files

Subtitle Converter Program za obradu srt i txt fajlova. Requirements: Python version 3.8 wxPython version 4.1.0 or newer Libraries: srt, PyDispatcher

4 Nov 25, 2022
Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Chinese NER(Named Entity Recognition) using BERT(Softmax, CRF, Span)

Weitang Liu 1.6k Jan 03, 2023
Deeply Supervised, Layer-wise Prediction-aware (DSLP) Transformer for Non-autoregressive Neural Machine Translation

Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision Training Efficiency We show the training efficiency of our DSLP model b

Chenyang Huang 37 Jan 04, 2023
GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning

GrammarTagger — A Neural Multilingual Grammar Profiler for Language Learning GrammarTagger is an open-source toolkit for grammatical profiling for lan

Octanove Labs 27 Jan 05, 2023
An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Welcome to AdaptNLP A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models

Novetta 407 Jan 03, 2023
Python library for Serbian Natural language processing (NLP)

SrbAI - Python biblioteka za procesiranje srpskog jezika SrbAI je projekat prikupljanja algoritama i modela za procesiranje srpskog jezika u jedinstve

Serbian AI Society 3 Nov 22, 2022