Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)

Overview

Length-Adaptive Transformer

This is the official Pytorch implementation of Length-Adaptive Transformer. For detailed information about the method, please refer to our paper.

Our code is based on HuggingFace's ( 🤗 ) Transformers library. Currently, it only supports limited transformers (BERT and DistilBERT) and downstream tasks (SQuAD 1.1 and GLUE benchmark). We will extend it one-by-one to support other transformers and tasks. You can easily apply our method to any other use cases beforehand.

Getting Started

Requirements

  • Python 3
  • PyTorch
  • 🤗 Transformers
  • torchprofile (to measure FLOPs)

Dataset Preparation

(Standard) Finetuning pretrained transformer

For SQuAD 1.1, use run_squad.py slightly modified from 🤗 Transformers' question-answering example.

python run_squad.py \
  --model_type bert \
  --model_name_or_path bert-base-uncased \
  --do_train \
  --do_eval \
  --evaluate_during_training \
  --save_only_best \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 3.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/standard

For GLUE, use run_glue.py slightly modified from 🤗 Transformers' text-classification example.

python run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/standard

Training with LengthDrop

Starting from a checkpoint finetuned without Drop-and-Restore, continue finetuning for additional steps with Drop-and-Restore and LengthDrop.

python run_squad.py \
  --model_type bert \
  --model_name_or_path $SQUAD_OUTPUT_DIR/standard/checkpoint-best \
  --do_train \
  --do_eval \
  --evaluate_during_training \
  --save_only_best \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_train_batch_size 32 \
  --per_gpu_eval_batch_size 32 \
  --learning_rate 5e-5 \
  --num_train_epochs 5.0 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/length_adaptive \
  --length_adaptive \
  --num_sandwich 2 \
  --length_drop_ratio_bound 0.2 \
  --layer_dropout_prob 0.2 \
python run_glue.py \
  --model_name_or_path $GLUE_OUTPUT_DIR/$TASK_NAME/standard/checkpoint-best \
  --task_name $TASK_NAME \
  --do_train \
  --do_eval \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_train_batch_size 32 \
  --per_device_eval_batch_size 32 \
  --learning_rate 2e-5 \
  --num_train_epochs 5.0 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/length_adaptive
  --length_adaptive \
  --num_sandwich 2 \
  --length_drop_ratio_bound 0.2 \
  --layer_dropout_prob 0.2 \

Evolutionary Search of Length Configurations

After training with LengthDrop, perform an evolutionary search to find length configurations for anytime prediction.

python run_squad.py \
  --model_type bert \
  --model_name_or_path $SQUAD_OUTPUT_DIR/length_adaptive/checkpoint-best \
  --do_search \
  --do_lower_case \
  --data_dir $SQUAD_DIR \
  --train_file train-v1.1.json \
  --predict_file dev-v1.1.json \
  --per_gpu_eval_batch_size 32 \
  --max_seq_length 384 \
  --doc_stride 128 \
  --output_dir $SQUAD_OUTPUT_DIR/evolutionary_search \
  --evo_iter 30 \
  --mutation_size 30 \
  --crossover_size 30 \
python run_glue.py \
  --model_name_or_path $GLUE_OUTPUT_DIR/$TASK_NAME/length_adaptive/checkpoint-best \
  --task_name $TASK_NAME \
  --do_search \
  --data_dir $GLUE_DIR/$TASK_NAME \
  --max_seq_length 128 \
  --per_device_eval_batch_size 32 \
  --output_dir $GLUE_OUTPUT_DIR/$TASK_NAME/evolutionary_search
  --evo_iter 30 \
  --mutation_size 30 \
  --crossover_size 30 \

License

Copyright 2020-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Owner
Clova AI Research
Open source repository of Clova AI Research, NAVER & LINE
Clova AI Research
Agile SVG maker for python

Agile SVG Maker Need to draw hundreds of frames for a GIF? Need to change the style of all pictures in a PPT? Need to draw similar images with differe

SemiWaker 4 Sep 25, 2022
Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation

Using Self-Supervised Pretext Tasks for Active Learning - Official Pytorch Implementation Experiment Setting: CIFAR10 (downloaded and saved in ./DATA

John Seon Keun Yi 38 Dec 27, 2022
Explainable Medical ImageSegmentation via GenerativeAdversarial Networks andLayer-wise Relevance Propagation

MedAI: Transparency in Medical Image Segmentation What is this repo This repo contains the code and experiments that are implemented to contribute in

Awadelrahman M. A. Ahmed 1 Nov 22, 2021
基于Paddlepaddle复现yolov5,支持PaddleDetection接口

PaddleDetection yolov5 https://github.com/Sharpiless/PaddleDetection-Yolov5 简介 PaddleDetection飞桨目标检测开发套件,旨在帮助开发者更快更好地完成检测模型的组建、训练、优化及部署等全开发流程。 PaddleD

36 Jan 07, 2023
BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work

BasicNeuralNetwork - This project looks over the basic structure of a neural network and how machine learning training algorithms work. For this project, I used the sigmoid function as an activation

Manas Bommakanti 1 Jan 22, 2022
Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Code accompanying "Dynamic Neural Relational Inference" This codebase accompanies the paper "Dynamic Neural Relational Inference" from CVPR 2020. This

Colin Graber 48 Dec 23, 2022
code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

code for "AttentiveNAS Improving Neural Architecture Search via Attentive Sampling"

Facebook Research 94 Oct 26, 2022
Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (CVAMD)

Is it Time to Replace CNNs with Transformers for Medical Images? Accepted at ICCV-2021: Workshop on Computer Vision for Automated Medical Diagnosis (C

Christos Matsoukas 80 Dec 27, 2022
Unofficial keras(tensorflow) implementation of MAE model from Masked Autoencoders Are Scalable Vision Learners

MAE-keras Unofficial keras(tensorflow) implementation of MAE model described in 'Masked Autoencoders Are Scalable Vision Learners'. This work has been

Yewon 11 Jun 12, 2022
Point Cloud Registration Network

PCRNet: Point Cloud Registration Network using PointNet Encoding Source Code Author: Vinit Sarode and Xueqian Li Paper | Website | Video | Pytorch Imp

ViNiT SaRoDe 59 Nov 19, 2022
MPRNet-Cloud-removal: Progressive cloud removal

MPRNet-Cloud-removal Progressive cloud removal Requirements 1.Pytorch = 1.0 2.Python 3 3.NVIDIA GPU + CUDA 9.0 4.Tensorboard Installation 1.Clone the

Semi 95 Dec 18, 2022
Code for Paper "Evidential Softmax for Sparse MultimodalDistributions in Deep Generative Models"

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
Exploration & Research into cross-domain MEV. Initial focus on ETH/POLYGON.

xMEV, an apt exploration This is a small exploration on the xMEV opportunities between Polygon and Ethereum. It's a data analysis exercise on a few pa

odyslam.eth 7 Oct 18, 2022
Tzer: TVM Implementation of "Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation (OOPSLA'22)“.

Artifact • Reproduce Bugs • Quick Start • Installation • Extend Tzer Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation This is the s

12 Dec 29, 2022
[CVPR 21] Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2021.

Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting, CVPR 2021. Ayan Kumar Bhunia, Pinaki nath Chowdhury, Yongxin Yan

Ayan Kumar Bhunia 44 Dec 12, 2022
Parallel and High-Fidelity Text-to-Lip Generation; AAAI 2022 ; Official code

Parallel and High-Fidelity Text-to-Lip Generation This repository is the official PyTorch implementation of our AAAI-2022 paper, in which we propose P

Zhying 77 Dec 21, 2022
A trusty face recognition research platform developed by Tencent Youtu Lab

Introduction TFace: A trusty face recognition research platform developed by Tencent Youtu Lab. It provides a high-performance distributed training fr

Tencent 956 Jan 01, 2023
Depression Asisstant GDSC Challenge Solution

Depression Asisstant can help you give solution. Please using Python version 3.9.5 for contribute.

Ananda Rauf 1 Jan 30, 2022
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 03, 2023
Official PyTorch implementation of PS-KD

Self-Knowledge Distillation with Progressive Refinement of Targets (PS-KD) Accepted at ICCV 2021, oral presentation Official PyTorch implementation of

61 Dec 28, 2022