PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Last update: Dec 28, 2022

Related tags

Deep Learning hyperformer

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):
```
bash scripts/hyperformer++.sh
```
To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):
```
bash scripts/hyperformer.sh
```
To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):
```
bash scripts/adapters_dagger.sh   
```
To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):
```
bash scripts/adapters.sh 
```
To run T5 finetuning model in a multi-task learning setup:
```
bash scripts/finetune.sh
```
To run T5 finetuning model in a single-task learning setup:
```
bash scripts/finetune_single_task.sh
```

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Related tags

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Installation

How to run the models

Bibliography

Final words

Owner

Rabeeh Karimi Mahabadi

An AutoML Library made with Optuna and PyTorch Lightning

Generating retro pixel game characters with Generative Adversarial Networks. Dataset "TinyHero" included.

Official implementation of our neural-network-based fast diffuse room impulse response generator (FAST-RIR)

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

zeus is a Python implementation of the Ensemble Slice Sampling method.

Code for ICCV 2021 paper "HuMoR: 3D Human Motion Model for Robust Pose Estimation"

Data and code from COVID-19 machine learning paper

🔀 Visual Room Rearrangement

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

RefineMask (CVPR 2021)

The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

deep learning model with only python and numpy with test accuracy 99 % on mnist dataset and different optimization choices

This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Simple image captioning model - CLIP prefix captioning.

Structure-Preserving Deraining with Residue Channel Prior Guidance (ICCV2021)

nnFormer: Interleaved Transformer for Volumetric Segmentation Code for paper "nnFormer: Interleaved Transformer for Volumetric Segmentation "

Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

The NEOSSat is a dual-mission microsatellite designed to detect potentially hazardous Earth-orbit-crossing asteroids and track objects that reside in deep space

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

This is an official implementation for "PlaneRecNet".