This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Last update: Dec 18, 2022

Overview

Dynamic-Vision-Transformer (Pytorch)

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length

Update on 2021/06/01: Release Pre-trained Models and the Inference Code on ImageNet.

Introduction

We develop a Dynamic Vision Transformer (DVT) to automatically configure a proper number of tokens for each individual image, leading to a significant improvement in computational efficiency, both theoretically and empirically.

Results

Top-1 accuracy on ImageNet v.s. GFLOPs

Top-1 accuracy on CIFAR v.s. GFLOPs

Top-1 accuracy on ImageNet v.s. Throughput

Visualization

Pre-trained Models

Backbone	# of Exits	# of Tokens	Links
T2T-ViT-12	3	7x7-10x10-14x14	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_state_dict: state dictionaries of the model
├── flops: a list containing the GFLOPs corresponding to exiting at each exit
├── anytime_classification: Top-1 accuracy of each exit
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --batch_size 128 --model DVT_T2t_vit_12 --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of T2T-ViT from here.

To Do

Update the code for training.

This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Related tags

Overview

Dynamic-Vision-Transformer (Pytorch)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Contact

Acknowledgment

To Do

Owner

Self-training for Few-shot Transfer Across Extreme Task Differences

LegoDNN: a block-grained scaling tool for mobile vision systems

Latent Network Models to Account for Noisy, Multiply-Reported Social Network Data

Godot RL Agents is a fully Open Source packages that allows video game creators

Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch

Simultaneous NMT/MMT framework in PyTorch

General Multi-label Image Classification with Transformers

QueryFuzz implements a metamorphic testing approach to test Datalog engines.

A free, multiplatform SDK for real-time facial motion capture using blendshapes, and rigid head pose in 3D space from any RGB camera, photo, or video.

Per-Pixel Classification is Not All You Need for Semantic Segmentation

PyTorch implementation of SQN based on CloserLook3D's encoder

Unsupervised Feature Loss (UFLoss) for High Fidelity Deep learning (DL)-based reconstruction

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce (EMNLP Founding 2021)

Fedlearn支持前沿算法研发的Python工具库 | Fedlearn algorithm toolkit for researchers

TRACER: Extreme Attention Guided Salient Object Tracing Network implementation in PyTorch

realsense d400 -> jpg + csv

Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

graph-theoretic framework for robust pairwise data association

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS