[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Last update: Sep 15, 2022

Overview

AMOS

This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks.

Paper: Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Overview

We provide the scripts in two versions, based on two widely-used open-source codebases, the Fairseq Library and the Huggingface Transformers Library. The two code versions are mostly equivalent in functionality, and you are free to use either of them. However, we note that the fairseq version is what we used in our experiments, and it will best reproduce the results in the paper; the huggingface version is implemented later to provide compatibility with the Huggingface Transformers Library, and may yield slightly different results.

Please follow the README files under the two directories for running the code.

GLUE Fine-Tuning Results

The General Language Understanding Evaluation (GLUE) benchmark is a collection of sentence- or sentence-pair language understanding tasks for evaluating and analyzing natural language understanding systems.

GLUE dev set results of AMOS base++ model are as follows (median of 5 different random seeds):

Model	MNLI-m/mm	QQP	QNLI	SST-2	CoLA	RTE	MRPC	STS-B	AVG
AMOS base++	90.5/90.4	92.4	94.4	95.5	71.8	86.6	91.7	92.0	89.4

GLUE test set results of AMOS base++ model are as follows (no ensemble, task-specific tricks, etc.):

Model	MNLI-m/mm	QQP	QNLI	SST-2	CoLA	RTE	MRPC	STS-B	AVG
AMOS base++	90.4/89.9	90.2	94.6	96.8	69.2	83.6	88.9	91.3	88.1

SQuAD 2.0 Fine-Tuning Results

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD 2.0 dev set results of AMOS base++ and large++ models are as follows (median of 5 different random seeds):

Model	EM	F1
AMOS base++	85.0	87.9

Citation

If you find the code and models useful for your research, please cite the following paper:

@inproceedings{meng2022amos,
  title={Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators},
  author={Meng, Yu and Xiong, Chenyan and Bajaj, Payal and Tiwary, Saurabh and Bennett, Paul and Han, Jiawei and Song, Xia},
  booktitle={International Conference on Learning Representations},
  year={2022}
}

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

You might also like...

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

191 Dec 31, 2022

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

TransUNet This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation Usage

1.4k Jan 4, 2023

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

State Entropy Maximization with Random Encoders for Efficient Exploration (RE3) (ICML 2021) Code for State Entropy Maximization with Random Encoders f

47 Nov 29, 2022

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

MTV-TSA: Adaptable GAN Encoders for Image Reconstruction via Multi-type Latent Vectors with Two-scale Attentions. This is the official code release fo

37 Dec 24, 2022

PyTorch Implement of Context Encoders: Feature Learning by Inpainting

Context Encoders: Feature Learning by Inpainting This is the Pytorch implement of CVPR 2016 paper on Context Encoders 1) Semantic Inpainting Demo Inst

321 Dec 25, 2022

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

0 Apr 22, 2022

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

17 Oct 30, 2022

SMD-Nets: Stereo Mixture Density Networks

SMD-Nets: Stereo Mixture Density Networks This repository contains a Pytorch implementation of "SMD-Nets: Stereo Mixture Density Networks" (CVPR 2021)

115 Dec 26, 2022

Comments

Training loss and acc/auc curve

Hi, I'm using amos now. My amos model (small size, discriminator ) have a low recall (70-80% percision while 40% recall). 60% mlm acc of generator. I would just like to ask if you can post the loss of both base and large models (or even share the loss training curve, acc curve or auc curve ) so that i have a kind of reference point when training own models. This will help me a lot!

Thank u.

opened by wwx13 1
Bump numpy from 1.21.2 to 1.22.0 in /huggingface
Bumps numpy from 1.21.2 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Releases(v0.1.0)

v0.1.0(Apr 7, 2022)
We release the pretrained AMOS model checkpoint and the dictionary file:

amos.tar.gz contains the AMOS base++ model; you need to extract the model from the archive.

dict.tar.gz contains the sentencepiece model (sp.model) and the vocabulary file (dict.txt).

Source code(tar.gz)
Source code(zip)
amos.tar.gz(511.14 MB)
dict.tar.gz(935.94 KB)

Owner

Microsoft

Open source projects and samples from Microsoft

GitHub Repository

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

Related tags

Overview

AMOS

Overview

GLUE Fine-Tuning Results

SQuAD 2.0 Fine-Tuning Results

Citation

Contributing

You might also like...

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

This repo holds code for TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

RE3: State Entropy Maximization with Random Encoders for Efficient Exploration

GAN encoders in PyTorch that could match PGGAN, StyleGAN v1/v2, and BigGAN. Code also integrates the implementation of these GANs.

PyTorch Implement of Context Encoders: Feature Learning by Inpainting

Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

SMD-Nets: Stereo Mixture Density Networks

Comments

Training loss and acc/auc curve

Bump numpy from 1.21.2 to 1.22.0 in /huggingface

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Releases(v0.1.0)

v0.1.0(Apr 7, 2022)

Owner

Microsoft

Vit-ImageClassification - Pytorch ViT for Image classification on the CIFAR10 dataset

BBScan py3 - BBScan py3 With Python

A vision library for performing sliced inference on large images/small objects

Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow

[CoRL 21'] TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view Stereo

CS50's Introduction to Artificial Intelligence Test Scripts

Garbage classification using structure data.

Official implementation of "StyleCariGAN: Caricature Generation via StyleGAN Feature Map Modulation" (SIGGRAPH 2021)

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Embracing Single Stride 3D Object Detector with Sparse Transformer

A Java implementation of the experiments for the paper "k-Center Clustering with Outliers in Sliding Windows"

ChebLieNet, a spectral graph neural network turned equivariant by Riemannian geometry on Lie groups.

Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Pytorch implementation of Generative Models as Distributions of Functions 🌿

Evaluating AlexNet features at various depths

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

PyTorch implementation of Memory-based semantic segmentation for off-road unstructured natural environments.

Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

Code for our TKDE paper "Understanding WeChat User Preferences and “Wow” Diffusion"

Simplified interface for TensorFlow (mimicking Scikit Learn) for Deep Learning

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio