Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Last update: Dec 27, 2022

Overview

DART

Implementation for ICLR2022 paper Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners.

Environment

[email protected]
Use pip install -r requirements.txt to install dependencies.
wandb account is required if the user wants to search for best hyper-parameter combinations.

Data source

16-shot GLUE dataset from LM-BFF.
Generated data consists of 5 random splits (13/21/42/87/100) for a task, each has 16 samples.

How to run

To run across each 5 splits in a task, use run.py:
- In the arguments, encoder="inner" is the method proposed in the paper where verbalizers are other trainable tokens; encoder="manual" means verbalizers are selected fixed tokens; encoder="lstm" refers to the P-Tuning method.

$ python run.py -h
usage: run.py [-h] [--encoder {manual,lstm,inner,inner2}] [--task TASK]
              [--num_splits NUM_SPLITS] [--repeat REPEAT] [--load_manual]
              [--extra_mask_rate EXTRA_MASK_RATE]
              [--output_dir_suffix OUTPUT_DIR_SUFFIX]

optional arguments:
  -h, --help            show this help message and exit
  --encoder {manual,lstm,inner,inner2}
  --task TASK
  --num_splits NUM_SPLITS
  --repeat REPEAT
  --load_manual
  --extra_mask_rate EXTRA_MASK_RATE
  --output_dir_suffix OUTPUT_DIR_SUFFIX, -o OUTPUT_DIR_SUFFIX

To train and evaluate on a single split with details recorded, use inference.py.
- Before running, [task_name, label_list, prompt_type] should be configured in the code.
- prompt_type="none" refers to fixed verbalizer training, while "inner" refers to the method proposed in the paper. ("inner2" is deprecated 2-stage training)
To find optimal hyper-parameters for each task-split and reproduce our result, please use sweep.py:
- Please refer to documentation for WandB for more details.

$ python sweep.py -h
usage: sweep.py [-h]
                [--task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}]
                [--encoder {none,mlp,lstm,inner,inner2}]
                [--seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]]
                [--batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]]
                [--sweep_id SWEEP_ID]

optional arguments:
  -h, --help            show this help message and exit
  --task {SST-2,sst-5,mr,cr,mpqa,subj,trec,CoLA,MNLI,MNLI-mm,SNLI,QNLI,RTE-glue,MRPC,QQP}
  --encoder {none,mlp,lstm,inner,inner2}
  --seed_split {13,21,42,87,100} [{13,21,42,87,100} ...]
  --batch_size {4,8,16,24,32} [{4,8,16,24,32} ...]
  --sweep_id SWEEP_ID

To train and evaluate with more customized configurations, use cli.py.
To analyze and visualize the results come from inference.py, use visualize.py and visualize_word_emb.py.

How to Cite

@article{DBLP:journals/corr/abs-2108-13161,
  author    = {Ningyu Zhang and
               Luoqiu Li and
               Xiang Chen and
               Shumin Deng and
               Zhen Bi and
               Chuanqi Tan and
               Fei Huang and
               Huajun Chen},
  title     = {Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
               Learners},
  journal   = {CoRR},
  volume    = {abs/2108.13161},
  year      = {2021},
  url       = {https://arxiv.org/abs/2108.13161},
  eprinttype = {arXiv},
  eprint    = {2108.13161},
  timestamp = {Thu, 13 Jan 2022 17:33:17 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2108-13161.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Related tags

Overview

DART

Environment

Data source

How to run

How to Cite

Owner

ZJUNLP

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

Graph WaveNet apdapted for brain connectivity analysis.

The official implementation of ICCV paper "Box-Aware Feature Enhancement for Single Object Tracking on Point Clouds".

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Random Walk Graph Neural Networks

automatic color-grading

MoCoGAN: Decomposing Motion and Content for Video Generation

This is an official implementation for "Video Swin Transformers".

Opinionated code formatter, just like Python's black code formatter but for Beancount

Implement the Pareto Optimizer and pcgrad to make a self-adaptive loss for multi-task

fcn by tensorflow

Image-to-Image Translation in PyTorch

Meta graph convolutional neural network-assisted resilient swarm communications

RodoSol-ALPR Dataset

PyTorch Implementation of [1611.06440] Pruning Convolutional Neural Networks for Resource Efficient Inference

Summary of related papers on visual attention

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Experiments with the Robust Binary Interval Search (RBIS) algorithm, a Query-Based prediction algorithm for the Online Search problem.

A New Approach to Overgenerating and Scoring Abstractive Summaries