Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Last update: Nov 30, 2022

Overview

Knowledge Distillation for BERT Unsupervised Domain Adaptation

Official PyTorch implementation | Paper

Abstract

A pre-trained language model, BERT, has brought significant performance improvements across a range of natural language processing tasks. Since the model is trained on a large corpus of diverse topics, it shows robust performance for domain shift problems in which data distributions at training (source data) and testing (target data) differ while sharing similarities. Despite its great improvements compared to previous models, it still suffers from performance degradation due to domain shifts. To mitigate such problems, we propose a simple but effective unsupervised domain adaptation method, adversarial adaptation with distillation (AAD), which combines the adversarial discriminative domain adaptation (ADDA) framework with knowledge distillation. We evaluate our approach in the task of cross-domain sentiment classification on 30 domain pairs, advancing the state-of-the-art performance for unsupervised domain adaptation in text sentiment classification.

Requirements

pandas
pytorch
transformers

Run the test

$ python main.py --pretrain --adapt --src books --tgt dvd

How to cite

@article{ryu2020knowledge,
  title={Knowledge Distillation for BERT Unsupervised Domain Adaptation},
  author={Ryu, Minho and Lee, Kichun},
  journal={arXiv preprint arXiv:2010.11478},
  year={2020}
}

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Related tags

Overview

Knowledge Distillation for BERT Unsupervised Domain Adaptation

Abstract

Requirements

Run the test

How to cite

Owner

Minho Ryu

LSTM model trained on a small dataset of 3000 names written in PyTorch

Real time Human Detection Counting

MMRazor: a model compression toolkit for model slimming and AutoML

Code for the paper “The Peril of Popular Deep Learning Uncertainty Estimation Methods”

Official Pytorch implementation of C3-GAN

ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs

FB-tCNN for SSVEP Recognition

Code to go with the paper "Decentralized Bayesian Learning with Metropolis-Adjusted Hamiltonian Monte Carlo"

Towards Flexible Blind JPEG Artifacts Removal (FBCNN, ICCV 2021)

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

EfficientDet (Scalable and Efficient Object Detection) implementation in Keras and Tensorflow

Go from graph data to a secure and interactive visual graph app in 15 minutes. Batteries-included self-hosting of graph data apps with Streamlit, Graphistry, RAPIDS, and more!

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets

Self-attentive task GAN for space domain awareness data augmentation.

A library for differentiable nonlinear optimization.

Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

DuBE: Duple-balanced Ensemble Learning from Skewed Data

DECAF: Deep Extreme Classification with Label Features

Ultra-lightweight human body posture key point CNN model. ModelSize:2.3MB HUAWEI P40 NCNN benchmark: 6ms/img,

Kaggle: Cell Instance Segmentation