This is official implementaion of paper "Token Shift Transformer for Video Classification".

Last update: Dec 30, 2022

Related tags

Overview

TokShift-Transformer

This is official implementaion of paper "Token Shift Transformer for Video Classification". We achieve SOTA performance 80.40% on Kinetics-400 val. Paper link

Updates
Model Zoo and Baselines
Installation
Quick Start
Contributors
Citing
Acknowledgement

Updates

July 11, 2021

Release this V1 version (the version used in paper) to public.
we are preparing a V2 version which include the following modifications, will release within 1 week:

Directly decode video mp4 file during training/evaluation
Change to adopt standarlize timm code-base.
Performances are further improved than reported in paper version (average +0.5).

April 22, 2021

Add Train/Test guidline and Data perpariation

April 16, 2021

Publish TokShift Transformer for video content understanding

Model Zoo and Baselines

architecture	backbone	pretrain	Res & Frames	GFLOPs x views	top1	config
ViT (Video)	Base16	ImgNet21k	224 & 8	134.7 x 30	76.02 `link`	k400_vit_8x32_224.yml
TokShift	Base-16	ImgNet21k	224 & 8	134.7 x 30	77.28 `link`	k400_tokshift_div4_8x32_base_224.yml
TokShift (MR)	Base16	ImgNet21k	256 & 8	175.8 x 30	77.68 `link`	k400_tokshift_div4_8x32_base_256.yml
TokShift (HR)	Base16	ImgNet21k	384 & 8	394.7 x 30	78.14 `link`	k400_tokshift_div4_8x32_base_384.yml
TokShift	Base16	ImgNet21k	224 & 16	268.5 x 30	78.18 `link`	k400_tokshift_div4_16x32_base_224.yml
TokShift-Large (HR)	Large16	ImgNet21k	384 & 8	1397.6 x 30	79.83 `link`	k400_tokshift_div4_8x32_large_384.yml
TokShift-Large (HR)	Large16	ImgNet21k	384 & 12	2096.4 x 30	80.40 `link`	k400_tokshift_div4_12x32_large_384.yml

Below is trainig log, we use 3 views evaluation (instead of 30 views) during validation for time-saving.

Installation

PyTorch >= 1.7, torchvision
tensorboardx

Quick Start

Train

Download ImageNet-22k pretrained weights from Base16 and Large16.
Prepare Kinetics-400 dataset organized in the following structure, trainValTest

k400
|_ frames331_train
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ frames331_val
|  |_ [category name 0]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |
|  |_ [category name 1]
|  |  |_ [video name 0]
|  |  |  |_ img_00001.jpg
|  |  |  |_ img_00002.jpg
|  |  |  |_ ...
|  |  |
|  |  |_ [video name 1]
|  |  |   |_ img_00001.jpg
|  |  |   |_ img_00002.jpg
|  |  |   |_ ...
|  |  |_ ...
|  |_ ...
|
|_ trainValTest
   |_ train.txt
   |_ val.txt

Using train-script (train.sh) to train k400

#!/usr/bin/env python
import os

cmd = "python -u main_ddp_shift_v3.py \
		--multiprocessing-distributed --world-size 1 --rank 0 \
		--dist-ur tcp://127.0.0.1:23677 \
		--tune_from pretrain/ViT-L_16_Img21.npz \
		--cfg config/custom/kinetics400/k400_tokshift_div4_12x32_large_384.yml"
os.system(cmd)

Test

Using test.sh (test.sh) to evaluate k400

#!/usr/bin/env python
import os
cmd = "python -u main_ddp_shift_v3.py \
        --multiprocessing-distributed --world-size 1 --rank 0 \
        --dist-ur tcp://127.0.0.1:23677 \
        --evaluate \
        --resume model_zoo/ViT-B_16_k400_dense_cls400_segs8x32_e18_lr0.1_B21_VAL224/best_vit_B8x32x224_k400.pth \
        --cfg config/custom/kinetics400/k400_vit_8x32_224.yml"
os.system(cmd)

Contributors

VideoNet is written and maintained by Dr. Hao Zhang and Dr. Yanbin Hao.

Citing

If you find TokShift-xfmr is useful in your research, please use the following BibTeX entry for citation.

@article{tokshift2021,
  title={Token Shift Transformer for Video Classification},
  author={Hao Zhang, Yanbin Hao, Chong-Wah Ngo},
  journal={ACM Multimedia 2021},
}

Acknowledgement

Thanks for the following Github projects:

This is official implementaion of paper "Token Shift Transformer for Video Classification".

Related tags

Overview

TokShift-Transformer

Updates

July 11, 2021

April 22, 2021

April 16, 2021

Model Zoo and Baselines

Installation

Quick Start

Train

Test

Contributors

Citing

Acknowledgement

Owner

VideoNet

PRIME: A Few Primitives Can Boost Robustness to Common Corruptions

[Nature Machine Intelligence' 21] "Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in Artificial Intelligence"

Code for paper "Extract, Denoise and Enforce: Evaluating and Improving Concept Preservation for Text-to-Text Generation" EMNLP 2021

Naszilla is a Python library for neural architecture search (NAS)

PyTorch IPFS Dataset

Code accompanying "Evolving spiking neuron cellular automata and networks to emulate in vitro neuronal activity," accepted to IEEE SSCI ICES 2021

Learning an Adaptive Meta Model-Generator for Incrementally Updating Recommender Systems

Learning Neural Network Subspaces

Pytorch Implementation for Dilated Continuous Random Field

This repo tries to recognize faces in the dataset you created

Must-read Papers on Physics-Informed Neural Networks.

PSML: A Multi-scale Time-series Dataset for Machine Learning in Decarbonized Energy Grids

[ICLR 2022] DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR

Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

[NeurIPS 2020] Semi-Supervision (Unlabeled Data) & Self-Supervision Improve Class-Imbalanced / Long-Tailed Learning

This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

A collection of scripts I developed for personal and working projects.

Audio Visual Emotion Recognition using TDA

Repository for the AugmentedPCA Python package.