GLM (General Language Model)

Last update: Jan 04, 2023

Related tags

Overview

GLM

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

Please refer to our paper for a detailed description of GLM:

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: equal contribution)

Part of the code is based on Megatron-LM and PET.

Pretrained Models

You can download the pretrained models used in the paper here.

Name	Params	File	Config
GLM-Base	110M	glm-base-blank.tar.bz2	model_blocklm_base.sh
GLM-Large	335M	glm-large-blank.tar.bz2	model_blocklm_large.sh
GLM-Large (multi-task)	335M	glm-large-generation.tar.bz2	model_blocklm_large_generation.sh
GLM-410M (multi-task)	410M	glm-1.25-generation.tar.bz2	model_blocklm_1.25_generation.sh
GLM-515M (multi-task)	515M	glm-1.5-generation.tar.bz2	model_blocklm_1.5_generation.sh
GLM-RoBERTa	335M	glm-roberta-large-blank.tar.bz2	model_blocklm_roberta_large.sh

Installation

Clone this repo

git clone https://github.com/THUDM/GLM
cd GLM

Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by

pip install -r requirements.txt

Usage

We provide scripts for finetuning GLM on some downstream tasks.

SuperGLUE

Download the SuperGlue data and check the experiment setup in scripts/finetune_superglue.sh. Note that DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH need to be changed to your local path. You may also change the batch-size and nproc_per_node according to your available hardware. We suggest to use aggregated batch size 64 for MultiRC and ReCORD and 16 for other tasks.
Run the following script (use the COPA dataset as an example)

bash scripts/finetune_superglue.sh \
     config_tasks/model_blocklm_roberta_large.sh \
     config_tasks/task_copa.sh

To apply GLM to a new NLU dataset with cloze-filling finetuning, implement a DataProcessor in tasks/superglue/dataset.py for data loading and add a PVP in tasks/superglue/pvp.py for the cloze question. More details can be found here.
The cloze questions (prompts) used in this work are written by human. We are also studying a P-tuning (prompt tuning) approach to search for the optimal continuous prompt. Please refer to our paper and code.

Text Summarization

Download the Gigaword dataset and check the experiment setup in scripts/finetune_seq2seq.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.
Run the following script

bash scripts/finetune_seq2seq.sh \ 
     config_tasks/model_blocklm_large_generation.sh \ 
     config_tasks/seq_gigaword.sh

For calculating rouge, install file2rouge from here and run bash scripts/evaluate_seq2seq.sh

Language Modeling

LAMBADA Cloze Accuracy

Download the LAMBADA data and change DATA_ROOT, CHECKPOINT_PATH in scripts/evaluate_lm.sh
Run the following script

bash scripts/evaluate_lm.sh \ 
     config_tasks/model_blocklm_large_generation.sh \
     config_tasks/zero_lambada.sh

LM Perplexity

Download our test set of wikibook (or any dataset following the same format) and change DATA_ROOT, CHECKPOINT_PATH in scripts/evaluate_lm.sh

Run the following script

bash scripts/evaluate_lm.sh \ 
   config_tasks/model_blocklm_large_generation.sh \
   config_tasks/zero_lm.sh

Blank Language Model

Download the Yahoo dataset and check the experiment setup in scripts/finetune_blank.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.
Run the following script

bash scripts/finetune_blank.sh \ 
     config_tasks/model_blocklm_large.sh \ 
     config_tasks/seq_blank.sh

Blank Filling (Interactive)

Change CHECKPOINT_PATH to your local path. Run the following script

bash scripts/generate_block.sh \
     config_tasks/model_blocklm_large.sh

Example:

Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.

GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a pioneer in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the stanford university

Citation

Please cite our paper if you find this code useful for your research:

@article{DBLP:journals/corr/abs-2103-10360,
  author    = {Zhengxiao Du and
               Yujie Qian and
               Xiao Liu and
               Ming Ding and
               Jiezhong Qiu and
               Zhilin Yang and
               Jie Tang},
  title     = {All {NLP} Tasks Are Generation Tasks: {A} General Pretraining Framework},
  journal   = {CoRR},
  volume    = {abs/2103.10360},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.10360}
}

GLM (General Language Model)

Related tags

Overview

GLM

Pretrained Models

Installation

Usage

SuperGLUE

Text Summarization

Language Modeling

LAMBADA Cloze Accuracy

LM Perplexity

Blank Language Model

Blank Filling (Interactive)

Citation

Owner

THUDM

Author Disambiguation using Knowledge Graph Embeddings with Literals

Image inpainting using Gaussian Mixture Models

A machine learning project which can detect and predict the skin disease through image recognition.

Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

《Truly shift-invariant convolutional neural networks》(2021)

AI-based, context-driven network device ranking

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Data reduction pipeline for KOALA on the AAT.

Underwater image enhancement

RGB-D Local Implicit Function for Depth Completion of Transparent Objects

Deeplearning project at The Technological University of Denmark (DTU) about Neural ODEs for finding dynamics in ordinary differential equations and real world time series data

Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

A Python type explainer!

Identifying Stroke Indicators Using Rough Sets

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

GLM (General Language Model)

Related tags

Overview

GLM

Pretrained Models

Installation

Usage

SuperGLUE

Text Summarization

Language Modeling

LAMBADA Cloze Accuracy

LM Perplexity

Blank Language Model

Blank Filling (Interactive)

Citation

Owner

THUDM

Author Disambiguation using Knowledge Graph Embeddings with Literals

Image inpainting using Gaussian Mixture Models

A machine learning project which can detect and predict the skin disease through image recognition.

Code to reproduce results from the paper "AmbientGAN: Generative models from lossy measurements"

Clean and readable code for Decision Transformer: Reinforcement Learning via Sequence Modeling

《Truly shift-invariant convolutional neural networks》(2021)

AI-based, context-driven network device ranking

Towards Rolling Shutter Correction and Deblurring in Dynamic Scenes (CVPR2021)

Official implementation of the paper WAV2CLIP: LEARNING ROBUST AUDIO REPRESENTATIONS FROM CLIP

A Robust Unsupervised Ensemble of Feature-Based Explanations using Restricted Boltzmann Machines

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Data reduction pipeline for KOALA on the AAT.

Underwater image enhancement

RGB-D Local Implicit Function for Depth Completion of Transparent Objects

Deeplearning project at The Technological University of Denmark (DTU) about Neural ODEs for finding dynamics in ordinary differential equations and real world time series data

Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

A PyTorch implementation of "SimGNN: A Neural Network Approach to Fast Graph Similarity Computation" (WSDM 2019).

A Python type explainer!

Identifying Stroke Indicators Using Rough Sets

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang