Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Overview

Cross-Attention Transfer for Machine Translation

This repo hosts the code to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021.

Setup

We provide our scripts and modifications to Fairseq. In this section, we describe how to go about running the code and, for instance, reproduce Table 2 in the paper.

Data

To view the data as we prepared and used it, switch to the main branch. But we recommend cloning code from this branch to avoid downloading a large amount of data at once. You can always obtain any data as necessary from the main branch.

Installations

We worked in a conda environment with Python 3.8.

  • First install the requirements.
      pip install requirements.txt
  • Then install Fairseq. To have the option to modify the package, install it in editable mode.
      cd fairseq-modified
      pip install -e .
  • Finally, set the following environment variable.
      export FAIRSEQ=$PWD
      cd ..

Experiments

For the purpose of this walk-through, we assume we want to train a De–En model, using the following data:

De-En
├── iwslt13.test.de
├── iwslt13.test.en
├── iwslt13.test.tok.de
├── iwslt13.test.tok.en
├── iwslt15.tune.de
├── iwslt15.tune.en
├── iwslt15.tune.tok.de
├── iwslt15.tune.tok.en
├── iwslt16.train.de
├── iwslt16.train.en
├── iwslt16.train.tok.de
└── iwslt16.train.tok.en

by transferring from a Fr–En parent model, the experiment files of which is stored under FrEn/checkpoints.

  • Start by making an experiment folder and preprocessing the data.
      mkdir test_exp
      ./xattn-transfer-for-mt/scripts/data_preprocessing/prepare_bi.sh \
          de en test_exp/ \
          De-En/iwslt16.train.tok De-En/iwslt15.tune.tok De-En/iwslt13.test.tok \
          8000
    Please note that prepare_bi.sh is written for the most general case, where you are learning vocabulary for both the source and target sides. When necessary modify it, and reuse whatever vocabulary you want. In this case, e.g., since we are transferring from Fr–En to De–En, we will reuse the target side vocabulary from the parent. So 8000 refers to the source vocabulary size, and we need to copy parent target vocabulary instead of learning one in the script.
      cp ./FrEn/data/tgt.sentencepiece.bpe.model $DATA
      cp ./FrEn/data/tgt.sentencepiece.bpe.vocab $DATA
  • Now you can run an experiment. Here we want to just update the source embeddings and the cross-attention. So we run the corresponding script. Script names are self-explanatory. Set the correct path to the desired parent model checkpoint in the script, and:
      bash ./xattn-transfer-for-mt/scripts/training/reinit-src-embeddings-and-finetune-parent-model-on-translation_src+xattn.sh \
          test_exp/ de en
  • Finally, after training, evaluate your model. Set the correct path to the detokenizer that you use in the script, and:
      bash ./xattn-transfer-for-mt/scripts/evaluation/decode_and_score_valid_and_test.sh \
          test_exp/ de en \
          $PWD/De-En/iwslt15.tune.en $PWD/De-En/iwslt13.test.en

Issues

Please contact us and report any problems you might face through the issues tab of the repo. Thanks in advance for helping us improve the repo!

Credits

The main body of code is built upon Fairseq. We found it very easy to navigate and modify. Kudos to the developers!
The data preprocessing scripts are adopted from FLORES scripts.
To have mBART fit on the GPUs that we worked with memory-wise, we used the trimming solution provided here.

Citation

@inproceedings{gheini-cross-attention,
  title = "Cross-Attention is All You Need: {A}dapting Pretrained {T}ransformers for Machine Translation",
  author = "Gheini, Mozhdeh and Ren, Xiang and May, Jonathan",
  booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
  month = nov,
  year = "2021"
}
Owner
Mozhdeh Gheini
Computer Science Ph.D. Student at the University of Southern California
Mozhdeh Gheini
😊 Python module for face feature changing

PyWarping Python module for face feature changing Installation pip install pywarping If you get an error: No such file or directory: 'cmake': 'cmake',

Dopevog 10 Sep 10, 2021
A mini library for Policy Gradients with Parameter-based Exploration, with reference implementation of the ClipUp optimizer from NNAISENSE.

PGPElib A mini library for Policy Gradients with Parameter-based Exploration [1] and friends. This library serves as a clean re-implementation of the

NNAISENSE 56 Jan 01, 2023
Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers

Pose Transformers: Human Motion Prediction with Non-Autoregressive Transformers This is the repo used for human motion prediction with non-autoregress

Idiap Research Institute 26 Dec 14, 2022
Large dataset storage format for Pytorch

H5Record Large dataset ( 100G, = 1T) storage format for Pytorch (wip) Support python 3 pip install h5record Why? Writing large dataset is still a

theblackcat102 43 Oct 22, 2022
Tutorial materials for Part of NSU Intro to Deep Learning with PyTorch.

Intro to Deep Learning Materials are part of North South University (NSU) Intro to Deep Learning with PyTorch workshop series. (Slides) Related materi

Hasib Zunair 9 Jun 08, 2022
LRBoost is a scikit-learn compatible approach to performing linear residual based stacking/boosting.

LRBoost is a sckit-learn compatible package for linear residual boosting. LRBoost combines a linear estimator and a non-linear estimator to leverage t

Andrew Patton 5 Nov 23, 2022
Code for "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection", ICRA 2021

FGR This repository contains the python implementation for paper "FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection"(I

Yi Wei 31 Dec 08, 2022
Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs In this work, we propose an algorithm DP-SCAFFOLD(-warm), whic

19 Nov 10, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
Controlling Hill Climb Racing with Hand Tacking

Controlling Hill Climb Racing with Hand Tacking Opened Palm for Gas Closed Palm for Brake

Rohit Ingole 3 Jan 18, 2022
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
Fast Soft Color Segmentation

Fast Soft Color Segmentation

3 Oct 29, 2022
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
基于DouZero定制AI实战欢乐斗地主

DouZero_For_Happy_DouDiZhu: 将DouZero用于欢乐斗地主实战 本项目基于DouZero 环境配置请移步项目DouZero 模型默认为WP,更换模型请修改start.py中的模型路径 运行main.py即可 SL (baselines/sl/): 基于人类数据进行深度学习

1.5k Jan 08, 2023
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 08, 2022
The official implementation of the Hybrid Self-Attention NEAT algorithm

PUREPLES - Pure Python Library for ES-HyperNEAT About This is a library of evolutionary algorithms with a focus on neuroevolution, implemented in pure

Adrian Westh 91 Dec 12, 2022
Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data arXiv This is the code base for weakly supervised NER. We provide a

Amazon 92 Jan 04, 2023
How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop

Training a NN to 99% accuracy on MNIST in 0.76 seconds A quick study on how fast you can reach 99% accuracy on MNIST with a single laptop. Our answer

Tuomas Oikarinen 42 Dec 10, 2022
A list of multi-task learning papers and projects.

This page contains a list of papers on multi-task learning for computer vision. Please create a pull request if you wish to add anything. If you are interested, consider reading our recent survey pap

svandenh 297 Dec 17, 2022