The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Last update: Nov 29, 2022

Related tags

Overview

Shuffle Transformer

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Introduction

Very recently, window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. Shuffle Transformer revisit the spatial shuffle as an efficient way to build connections among windows, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation.

Requirements

PyTorch==1.7.1
torchvision==0.8.2
timm==0.3.2

The Apex is optional for faster training speed.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Other Requirements

pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
pip install einops

Main Results

Results on ImageNet-1K

name	[email protected]	#params	FLOPs	Throughputs(Images/s)	Weights
Shuffle-T	82.4	28M	4.6G	791	google drive
Shuffle-S	83.6	50M	8.9G	450	google drive
Shuffle-B	84.0	88M	15.6	279	google drive

Usage

For classification on ImageNet-1K, to train from scratch, run:

python -m torch.distributed.launch --nproc_per_node   main.py \ 
--cfg  --data-path  [--batch-size  --output ]

To evaluate, run:

python -m torch.distributed.launch --nproc_per_node  main.py --eval \
--cfg  --resume  --data-path

In progress

Semantic Segmentation
Instance Segmentation

Citing Shuffle Transformer

@article{huang2021shuffle,
 title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
 author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
 journal={arXiv preprint arXiv:2106.03650},
 year={2021}
}

Acknowledgement

Thanks to open-source implementation of Swin-Transformer.

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Related tags

Overview

Shuffle Transformer

Introduction

Requirements

Main Results

Results on ImageNet-1K

Usage

In progress

Citing Shuffle Transformer

Acknowledgement

Owner

Robust Self-augmentation for NER with Meta-reweighting

Implementation of the Remixer Block from the Remixer paper, in Pytorch

Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

Supervised domain-agnostic prediction framework for probabilistic modelling

[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

Dilated Convolution for Semantic Image Segmentation

Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Learnable Motion Coherence for Correspondence Pruning

Collection of in-progress libraries for entity neural networks.

Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

Anatomy of Matplotlib -- tutorial developed for the SciPy conference

Python implementation of "Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation"

Running AlphaFold2 (from ColabFold) in Azure Machine Learning

BankNote-Net: Open dataset and encoder model for assistive currency recognition

Fuse radar and camera for detection

Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

An open source app to help calm you down when needed.

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"