Source code for "Efficient Training of BERT by Progressively Stacking"

Overview

Introduction

This repository is the code to reproduce the result of Efficient Training of BERT by Progressively Stacking. The code is based on Fairseq.

Requirements and Installation

  • PyTorch >= 1.0.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • Python version 3.7

After PyTorch is installed, you can install requirements with:

pip install -r requirements.txt

Getting Started

Step 1:

bash install.sh

This script downloads:

  1. Moses Decoder
  2. Subword NMT
  3. Fast BPE (In the next steps, we use Subword NMT instead of Fast BPE. Recommended if you want to generate your own dictionary on a large-scale dataset.)

These library will do cleaning, tokenization, and BPE encoding for GLUE data in step 3. They will also be helpful if you want to make your own corpus for BERT training or if you want to test our model on your own tasks.

Step 2:

bash reproduce_bert.sh

This script runs progressive stacking and train a BERT. The code is tested on 4 Tesla P40 GPUs (24GB Gmem). For different hardware, you probably need to change the maximum number of tokens per batch (by changing max-tokens and update-freq).

Step 3:

bash reproduce_glue.sh

This script fine-tunes the BERT trained in step 2. The script chooses the checkpoint trained for 400K steps, which is the same as the stacking model in our paper.

Cite

@InProceedings{pmlr-v97-gong19a,
  title = 	 {Efficient Training of {BERT} by Progressively Stacking},
  author = 	 {Gong, Linyuan and He, Di and Li, Zhuohan and Qin, Tao and Wang, Liwei and Liu, Tieyan},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2337--2346},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Long Beach, California, USA},
  month = 	 {09--15 Jun},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/gong19a/gong19a.pdf},
  url = 	 {http://proceedings.mlr.press/v97/gong19a.html},
}
Owner
Gong Linyuan
Gong Linyuan
PyDiscord, a maintained fork of discord.py, is a python wrapper for the Discord API.

discord.py A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python. The Future of discord.py Please read the gi

Omkaar 1 Jan 16, 2022
Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy

Ice-Userbot Telegram Ice-Userbot adalah userbot Telegram modular yang berjalan di Python3 dengan database sqlalchemy. Berbasis Paperplane dan ProjectB

6 Apr 29, 2022
A Telegram Bot to Extract Various Types Of Archives

IDN Unzip Bot A Telegram Bot to Extract Various Types Of Archives Features Extract various types of archives like rar, zip, tar, 7z, tar.xz etc. Passw

IDNCoderX 8 Jul 25, 2022
Telegram Bot for saving and sharing personal and group notes

EZ Notes Bot (ezNotesBot) Telegram Bot for saving and sharing personal and group notes. Usage Personal notes: reply to any message in PM to save it as

Dash Eclipse 8 Nov 07, 2022
discord.py bot written in Python.

bakerbot Bakerbot is a discord.py bot written in Python :) Originally made as a learning exercise, now used by friends as a somewhat useful bot and us

8 Dec 04, 2022
OpenSea Bulk Uploader And Trader 100000 NFTs (MAC WINDOWS ANDROID LINUX) Automatically and massively upload and sell your non-fungible tokens on OpenSea using Python Selenium

OpenSea Bulk Uploader And Trader 100000 NFTs (MAC WINDOWS ANDROID LINUX) Automatically and massively upload and sell your non-fungible tokens on OpenS

ERC-7211 3 Mar 24, 2022
Yes, it's true :two_hearts: This repository has 316 stars.

Yes, it's true! Inspired by a similar repository from @RealPeha, but implemented using a webhook on AWS Lambda and API Gateway, so it's serverless! If

510 Dec 28, 2022
A Discord Server Cloner Which Can Clone Any Discord Server In Just Few Minutes

A Discord Server Cloner Which Can Clone Any Discord Server In Just Few Minutes.

samet 4 Jul 23, 2022
Telegram Bot to learn English by words and more.. ( in Arabic )

Get the mp3 files Extract the mp3.rar on the same file that bot.py on install requirements pip install -r requirements.txt #Then enter you bot token

Plugin 10 Feb 19, 2022
Streaming Finance Data with AWS Lambda

A data pipeline consisting of an AWS lambda function reading data from yfinance API, an AWS Kinesis stream to receive & store data in S3 buckets and AWS Glue crawler & Athena to run SQL queries.

Aarif Munwar Jahan 4 Aug 30, 2022
Nflmetrics - Johns Hopkins Spring 2022 Sports Analytics research project about NFL Draft Metrics

nflmetrics GitHub repo for Johns Hopkins Spring 2022 Sports Analytics research p

Anish Kulkarni 4 Feb 24, 2022
Project developed as part of a selection process for the company Denox

📝 Tabela de conteúdos Sobre Requisitos para rodar o projeto Instalação Rotas da API Observações 🧐 Sobre Projeto desenvolvido como parte de um proces

Ícaro Sant'Ana 1 Mar 01, 2022
A EddieHub API python package.

EddieHub A EddieHub API python package. Made with Python3 (C) @FayasNoushad Copyright permission under MIT License License - https://github.com/Fayas

Fayas Noushad 5 Sep 22, 2021
A python package to easy the integration with Direct Online Pay (Mpesa, TigoPesa, AirtelMoney, Card Payments)

A python package to easy the integration with Direct Online Pay (DPO) which easily allow you easily integrate with payment options once without having to deal with each of them individually;

Jordan Kalebu 2 Nov 25, 2021
A python script that automatically farms the Discord bot 'Dank Memer'.

Dank Farmer A python script that automatically farms the Discord bot 'Dank Memer'. Requirements pynput Disclaimer DO NOT use if you are not willing to

2 Dec 30, 2021
Fetch tracking numbers of Amazon orders, for the ease of the logistics.

Amazon-Tracking-Number Fetch tracking numbers of Amazon orders, for the ease of the logistics. Read Me First (How to use this code): Get Amazon "Items

Tony Yao 1 Nov 02, 2021
Auto file forward bot with python

Auto-File-Forward-Bot Auto file forward bot. Without Admin Permission in FROM_CHANNEL Only Give Permission In your Telegram Personal Channel Please fo

Milas 1 Oct 15, 2021
stories-matiasucker created by GitHub Classroom

Stories do Instagram Este projeto tem como objetivo desenvolver uma pequena aplicação que simule os efeitos e funcionalidades ao estilo Instagram. A a

1 Dec 20, 2021
A small bot to interact with the reddit API. Get top viewers and update the sidebar widget.

LiveStream_Reddit_Bot Get top twitch and facebook stream viewers for a game and update the sidebar widget and old reddit sidebar to show your communit

Tristan Wise 1 Nov 21, 2021
ByDiego Token Grabber is a Discord Stealer

ByDiego Token Grabber is a Discord Stealer. This way you can get too much information from x person if you pass it on and open it

zByDiegoM.T 4 Mar 11, 2022