Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows.

Overview

Swin-Transformer

Swin-Transformer is basically a hierarchical Transformer whose representation is computed with shifted windows. For more details, please refer to "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows"

This repo is an implementation of MegEngine version Swin-Transformer. This is also a showcase for training on GPU with less memory by leveraging MegEngine DTR technique.

There is also an official PyTorch implementation.

Usage

Install

  • Clone this repo:
git clone https://github.com/MegEngine/swin-transformer.git
cd swin-transformer
  • Install megengine==1.6.0
pip3 install megengine==1.6.0 -f https://megengine.org.cn/whl/mge.html

Training

To train a Swin Transformer using random data, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> train_random.py

To train a Swin Transformer using AMP (Auto Mix Precision), run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --mode mp train_random.py

To train a Swin Transformer using DTR in dynamic graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --dtr [--dtr-thd <eviction-threshold-of-dtr>] train_random.py

To train a Swin Transformer using DTR in static graph mode, run:

python3 -n <num-of-gpus-to-use> -b <batch-size-per-gpu> -s <num-of-train-steps> --trace --symbolic --dtr --dtr-thd <eviction-threshold-of-dtr> train_random.py

For example, to train a Swin Transformer with a single GPU using DTR in static graph mode with threshold=8GB and AMP, run:

python3 -n 1 -b 340 -s 10 --trace --symbolic --dtr --dtr-thd 8 --mode mp train_random.py

For more usage, run:

python3 train_random.py -h

Benchmark

  • Testing Devices
    • 2080Ti @ cuda-10.1-cudnn-v7.6.3-TensorRT-5.1.5.0 @ Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz
    • Reserve all CUDA memory by setting MGB_CUDA_RESERVE_MEMORY=1, in order to alleviate memory fragmentation problem
Settings Maximum Batch Size Speed(s/step) Throughput(images/s)
None 68 0.490 139
AMP 100 0.494 202
DTR in static graph mode 300 2.592 116
DTR in static graph mode + AMP 340 1.944 175

Acknowledgement

We are inspired by the Swin-Transformer repository, many thanks to microsoft!

Owner
旷视天元 MegEngine
旷视天元 MegEngine
GLANet - The code for Global and Local Alignment Networks for Unpaired Image-to-Image Translation arxiv

GLANet The code for Global and Local Alignment Networks for Unpaired Image-to-Image Translation arxiv Framework: visualization results: Getting Starte

stanley 29 Dec 14, 2022
OpenMMLab Model Deployment Toolset

Introduction English | 简体中文 MMDeploy is an open-source deep learning model deployment toolset. It is a part of the OpenMMLab project. Major features F

OpenMMLab 1.5k Dec 30, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
An e-commerce company wants to segment its customers and determine marketing strategies according to these segments.

customer_segmentation_with_rfm Business Problem : An e-commerce company wants to

Buse Yıldırım 3 Jan 06, 2022
Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera.

Tools to create pixel-wise object masks, bounding box labels (2D and 3D) and 3D object model (PLY triangle mesh) for object sequences filmed with an RGB-D camera. This project prepares training and t

305 Dec 16, 2022
Codes for building and training the neural network model described in Domain-informed neural networks for interaction localization within astroparticle experiments.

Domain-informed Neural Networks Codes for building and training the neural network model described in Domain-informed neural networks for interaction

DIDACTS 0 Dec 13, 2021
SuMa++: Efficient LiDAR-based Semantic SLAM (Chen et al IROS 2019)

SuMa++: Efficient LiDAR-based Semantic SLAM This repository contains the implementation of SuMa++, which generates semantic maps only using three-dime

Photogrammetry & Robotics Bonn 701 Dec 30, 2022
Prediction of MBA refinance Index (Mortgage prepayment)

Prediction of MBA refinance Index (Mortgage prepayment) Deep Neural Network based Model The ability to predict mortgage prepayment is of critical use

Ruchil Barya 1 Jan 16, 2022
Lucid Sonic Dreams syncs GAN-generated visuals to music.

Lucid Sonic Dreams Lucid Sonic Dreams syncs GAN-generated visuals to music. By default, it uses NVLabs StyleGAN2, with pre-trained models lifted from

731 Jan 02, 2023
TorchXRayVision: A library of chest X-ray datasets and models.

torchxrayvision A library for chest X-ray datasets and models. Including pre-trained models. ( 🎬 promo video about the project) Motivation: While the

Machine Learning and Medicine Lab 575 Jan 08, 2023
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan NOTE: This documentation describes a BETA release of PyStan 3. PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is

Stan 229 Dec 29, 2022
A smaller subset of 10 easily classified classes from Imagenet, and a little more French

Imagenette 🎶 Imagenette, gentille imagenette, Imagenette, je te plumerai. 🎶 (Imagenette theme song thanks to Samuel Finlayson) NB: Versions of Image

fast.ai 718 Jan 01, 2023
Towards Long-Form Video Understanding

Towards Long-Form Video Understanding Chao-Yuan Wu, Philipp Krähenbühl, CVPR 2021 [Paper] [Project Page] [Dataset] Citation @inproceedings{lvu2021,

Chao-Yuan Wu 69 Dec 26, 2022
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022
Meandering In Networks of Entities to Reach Verisimilar Answers

MINERVA Meandering In Networks of Entities to Reach Verisimilar Answers Code and models for the paper Go for a Walk and Arrive at the Answer - Reasoni

Shehzaad Dhuliawala 271 Dec 13, 2022
Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! Very tiny! Stock Market Financial Technical Analysis Python library . Quant Trading automation or cryptocoin exchange

MyTT Technical Indicators implemented in Python only using Numpy-Pandas as Magic - Very Very Fast! to Stock Market Financial Technical Analysis Python

dev 34 Dec 27, 2022
A curated list of awesome Model-Based RL resources

Awesome Model-Based Reinforcement Learning This is a collection of research papers for model-based reinforcement learning (mbrl). And the repository w

OpenDILab 427 Jan 03, 2023
7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle

kaggle-hpa-2021-7th-place-solution Code for 7th place solution of Human Protein Atlas - Single Cell Classification on Kaggle. A description of the met

8 Jul 09, 2021
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

524 Jan 08, 2023
RL-driven agent playing tic-tac-toe on starknet against challengers.

tictactoe-on-starknet RL-driven agent playing tic-tac-toe on starknet against challengers. GUI reference: https://pythonguides.com/create-a-game-using

21 Jul 30, 2022