Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Last update: Dec 29, 2022

Related tags

Deep Learning DAFormer

Overview

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

[Arxiv] [Paper]

As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA).

Even though a large number of methods propose new UDA strategies, they are mostly based on outdated network architectures. In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a network architecture tailored for UDA. It consists of a Transformer encoder and a multi-level context-aware feature fusion decoder.

DAFormer is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a Learning Rate Warmup promote feature transfer from ImageNet pretraining.

DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes and enables learning even difficult classes such as train, bus, and truck well.

The strengths of DAFormer, compared to the previous state-of-the-art UDA method ProDA, can also be observed in qualitative examples from the Cityscapes validation set.

For more information on DAFormer, please check our [Paper].

If you find this project useful in your research, please consider citing:

@article{hoyer2021daformer,
  title={DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  journal={arXiv preprint arXiv:2111.14887},
  year={2021}
}

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/daformer
source ~/venv/daformer/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7  # requires the other packages to be installed first

Further, please download the MiT weights and a pretrained DAFormer using the following script. If problems occur with the automatic download, please follow the instructions for a manual download within the script.

sh tools/download_checkpoints.sh

All experiments were executed on a NVIDIA RTX 2080 Ti.

Inference Demo

Already as this point, the provided DAFormer model (downloaded by tools/download_checkpoints.sh) can be applied to a demo image:

python -m demo.image_demo demo/demo.png work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/211108_1622_gta2cs_daformer_s0_7f24c.json work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/latest.pth

When judging the predictions, please keep in mind that DAFormer had no access to real-world labels during the training.

Setup Datasets

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA: Please, download all image and label packages from here and extract them to data/gta.

Synthia: Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

The final folder structure should look like this:

DAFormer
├── ...
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── synthia
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing: Finally, please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS:

python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8

Training

For convenience, we provide an annotated config file of the final DAFormer. A training job can be launched using:

python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py

For the experiments in our paper (e.g. network architecture comparison, component ablations, ...), we use a system to automatically generate and train the configs:

python run_experimenty.py --exp <ID>

More information about the available experiments and their assigned IDs, can be found in experiments.py. The generated configs will be stored in configs/generated/.

Testing & Predictions

The provided DAFormer checkpoint trained on GTA->Cityscapes (already downloaded by tools/download_checkpoints.sh) can be tested on the Cityscapes validation set using:

sh test.sh work_dirs/211108_1622_gta2cs_daformer_s0_7f24c

The predictions are saved for inspection to work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/preds and the mIoU of the model is printed to the console. The provided checkpoint should achieve 68.85 mIoU. Refer to the end of work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/20211108_164105.log for more information such as the class-wise IoU.

Similarly, also other models can be tested after the training has finished:

sh test.sh path/to/checkpoint_directory

Framework Structure

This project is based on mmsegmentation version 0.16.0. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for DAFormer are:

configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py: Annotated config file for the final DAFormer.
mmseg/models/uda/dacs.py: Implementation of UDA self-training with ImageNet Feature Distance.
mmseg/datasets/uda_dataset.py: Data loader for UDA with Rare Class Sampling.
mmseg/models/decode_heads/daformer_head.py: Implementation of DAFormer decoder with context-aware feature fusion.
mmseg/models/backbones/mix_transformer.py: Implementation of Mix Transformer encoder (MiT).

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.

Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Related tags

Overview

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Setup Environment

Inference Demo

Setup Datasets

Training

Testing & Predictions

Framework Structure

Acknowledgements

Owner

Lukas Hoyer

FastFace: Lightweight Face Detection Framework

All course materials for the Zero to Mastery Machine Learning and Data Science course.

A demonstration of using a live Tensorflow session to create an interactive face-GAN explorer.

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

🧮 Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Code for our paper Aspect Sentiment Quad Prediction as Paraphrase Generation in EMNLP 2021.

mmfewshot is an open source few shot learning toolbox based on PyTorch

Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

FFCV: Fast Forward Computer Vision (and other ML workloads!)

iris - Open Source Photos Platform Powered by PyTorch

MetaDrive: Composing Diverse Scenarios for Generalizable Reinforcement Learning

Unified API to facilitate usage of pre-trained "perceptor" models, a la CLIP

A Library for Modelling Probabilistic Hierarchical Graphical Models in PyTorch

Indices Matter: Learning to Index for Deep Image Matting

A Deep learning based streamlit web app which can tell with which bollywood celebrity your face resembles.

Adversarial Graph Representation Adaptation for Cross-Domain Facial Expression Recognition (AGRA, ACM 2020, Oral)

[CVPR'2020] DeepDeform: Learning Non-rigid RGB-D Reconstruction with Semi-supervised Data

Collection of TensorFlow2 implementations of Generative Adversarial Network varieties presented in research papers.

Boosted neural network for tabular data

Neural Module Network for VQA in Pytorch