The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Overview

Shuffle Transformer

The implementation of "Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer"

Introduction

Very recently, window-based Transformers, which computed self-attention within non-overlapping local windows, demonstrated promising results on image classification, semantic segmentation, and object detection. However, less study has been devoted to the cross-window connection which is the key element to improve the representation ability. Shuffle Transformer revisit the spatial shuffle as an efficient way to build connections among windows, which is highly efficient and easy to implement by modifying two lines of code. Furthermore, the depth-wise convolution is introduced to complement the spatial shuffle for enhancing neighbor-window connections. The proposed architectures achieve excellent performance on a wide range of visual tasks including image-level classification, object detection, and semantic segmentation.

st

Requirements

  • PyTorch==1.7.1
  • torchvision==0.8.2
  • timm==0.3.2

The Apex is optional for faster training speed.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Other Requirements

pip install opencv-python==4.4.0.46 termcolor==1.1.0 yacs==0.1.8
pip install einops

Main Results

Results on ImageNet-1K
name [email protected] #params FLOPs Throughputs(Images/s) Weights
Shuffle-T 82.4 28M 4.6G 791 google drive
Shuffle-S 83.6 50M 8.9G 450 google drive
Shuffle-B 84.0 88M 15.6 279 google drive

Usage

For classification on ImageNet-1K, to train from scratch, run:

python -m torch.distributed.launch --nproc_per_node   main.py \ 
--cfg  --data-path  [--batch-size  --output ]

To evaluate, run:

python -m torch.distributed.launch --nproc_per_node  main.py --eval \
--cfg  --resume  --data-path  

In progress

  • Semantic Segmentation
  • Instance Segmentation

Citing Shuffle Transformer

@article{huang2021shuffle,
 title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
 author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
 journal={arXiv preprint arXiv:2106.03650},
 year={2021}
}

Acknowledgement

Thanks to open-source implementation of Swin-Transformer.

Owner
A thousand miles begin with each single step.
Robust Self-augmentation for NER with Meta-reweighting

Robust Self-augmentation for NER with Meta-reweighting

Lam chi 17 Nov 22, 2022
Implementation of the Remixer Block from the Remixer paper, in Pytorch

Remixer - Pytorch Implementation of the Remixer Block from the Remixer paper, in Pytorch. It claims that substituting the feedforwards in transformers

Phil Wang 35 Aug 23, 2022
Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

tf-SNDCGAN Tensorflow implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (https://www.researchgate.net/publicati

Nhat M. Nguyen 248 Nov 25, 2022
Supervised domain-agnostic prediction framework for probabilistic modelling

A supervised domain-agnostic framework that allows for probabilistic modelling, namely the prediction of probability distributions for individual data

The Alan Turing Institute 112 Oct 23, 2022
[ArXiv 2021] Data-Efficient Instance Generation from Instance Discrimination

InsGen - Data-Efficient Instance Generation from Instance Discrimination Data-Efficient Instance Generation from Instance Discrimination Ceyuan Yang,

GenForce: May Generative Force Be with You 93 Dec 25, 2022
Dilated Convolution for Semantic Image Segmentation

Multi-Scale Context Aggregation by Dilated Convolutions Introduction Properties of dilated convolution are discussed in our ICLR 2016 conference paper

Fisher Yu 764 Dec 26, 2022
Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising

Deep-Rep-MFIR Official implementation of Deep Reparametrization of Multi-Frame Super-Resolution and Denoising Publication: Deep Reparametrization of M

Goutam Bhat 39 Jan 04, 2023
Learnable Motion Coherence for Correspondence Pruning

Learnable Motion Coherence for Correspondence Pruning Yuan Liu, Lingjie Liu, Cheng Lin, Zhen Dong, Wenping Wang Project Page Any questions or discussi

liuyuan 41 Nov 30, 2022
Collection of in-progress libraries for entity neural networks.

ENN Incubator Collection of in-progress libraries for entity neural networks: Neural Network Architectures for Structured State Entity Gym: Abstractio

25 Dec 01, 2022
Devkit for 3D -- Some utils for 3D object detection based on Numpy and Pytorch

D3D Devkit for 3D: Some utils for 3D object detection and tracking based on Numpy and Pytorch Please consider siting my work if you find this library

Jacob Zhong 27 Jul 07, 2022
Anatomy of Matplotlib -- tutorial developed for the SciPy conference

Introduction This tutorial is a complete re-imagining of how one should teach users the matplotlib library. Hopefully, this tutorial may serve as insp

Matplotlib Developers 1.1k Dec 29, 2022
Python implementation of "Multi-Instance Pose Networks: Rethinking Top-Down Pose Estimation"

MIPNet: Multi-Instance Pose Networks This repository is the official pytorch python implementation of "Multi-Instance Pose Networks: Rethinking Top-Do

Rawal Khirodkar 57 Dec 12, 2022
Running AlphaFold2 (from ColabFold) in Azure Machine Learning

Running AlphaFold2 (from ColabFold) in Azure Machine Learning Colby T. Ford, Ph.D. Companion repository for Medium Post: How to predict many protein s

Colby T. Ford 3 Feb 18, 2022
BankNote-Net: Open dataset and encoder model for assistive currency recognition

BankNote-Net: Open Dataset for Assistive Currency Recognition Millions of people around the world have low or no vision. Assistive software applicatio

Microsoft 13 Oct 28, 2022
Fuse radar and camera for detection

SAF-FCOS: Spatial Attention Fusion for Obstacle Detection using MmWave Radar and Vision Sensor This project hosts the code for implementing the SAF-FC

ChangShuo 18 Jan 01, 2023
Multi-robot collaborative exploration and mapping through Voronoi partition and DRL in unknown environment

Voronoi Multi_Robot Collaborate Exploration Introduction In the unknown environment, the cooperative exploration of multiple robots is completed by Vo

PeaceWord 6 Nov 22, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

SO-Pose This repository contains codes of ICCV2021 paper: SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation This paper is basically an

shangbuhuan 52 Nov 25, 2022
An open source app to help calm you down when needed.

By: Seanpm2001, Et; Al. Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans |

Sean P. Myrick V19.1.7.2 2 Oct 24, 2022
Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data

Ayush Daksh 12 Dec 01, 2022