CMT: Convolutional Neural Networks Meet Vision Transformers

Last update: Dec 30, 2022

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

This repo is the CMT model which impelement with pytorch, no reference source code so this is a non-official version.

2. Enveriments

python 3.7+
pytorch 1.7.1
pillow
apex
opencv-python

You can see this repo to find how to install the apex

3. DataSet

Trainig

/data/home/imagenet/train/xxx.jpeg, 0
/data/home/imagenet/train/xxx.jpeg, 1
...
/data/home/imagenet/train/xxx.jpeg, 999

Testing

/data/home/imagenet/test/xxx.jpeg, 0
/data/home/imagenet/test/xxx.jpeg, 1
...
/data/home/imagenet/test/xxx.jpeg, 999

4. Training & Inference

Training

CMT-Tiny

#!/bin/bash
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1
export OMP_NUM_THREADS
export MKL_NUM_THREADS
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train.py --batch_size 512 --num_workers 48 --lr 6e-3 --optimizer_name "adamw" --tf_optimizer 1 --cosine 1 --model_name cmtti --max_epochs 300 \
--warmup_epochs 5 --num-classes 1000 --input_size 184 \ --crop_size 160 --weight_decay 1e-1 --grad_clip 0 --repeated-aug 0 --max_grad_norm 5.0 
--drop_path_rate 0.1 --FP16 0 --qkv_bias 1 
--ape 0 --rpe 1 --pe_nd 0 --mode O2 --amp 1 --apex 0 \ 
--train_file $file_folder$/train.txt \
--val_file $file_folder$/val.txt \
--log-dir $save_folder$/log_dir \
--checkpoints-path $save_folder$/checkpoints

Note: If you use the bs 128 * 8 may be get more accuracy, balance the acc & speed.

Inference

#!/bin/bash
cd CMT-pytorch;
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore test.py \
--dist-url 'tcp://127.0.0.1:9966' --dist-backend 'nccl' --multiprocessing-distributed=1 --world-size=1  --rank=0 
--batch-size 128 --num-workers 48 --num-classes 1000 --input_size 184 --crop_size 160 \
--ape 0 --rpe 1 --pe_nd 0 --qkv_bias 1 --swin 0 --model_name cmtti --dropout 0.1 --emb_dropout 0.1 \
--test_file $file_folder$/val.txt \
--checkpoints-path $save_folder$/checkpoints/xxx.pth.tar \
--save_folder $save_folder$/acc_logits/

calculate acc

python utils/calculate_acc.py --logits_file $save_folder$/acc_logits/

5. Imagenet Result

model-name	input_size	FLOPs	Params	[email protected]_crop(ours)	acc(papers)	weights
CMT-T	160x160	516M	11.3M	75.124%	79.2%	weights
CMT-T	224x224	1.01G	11.3M	78.4%	-	weights
CMT-XS	192x192	-	-	-	81.8%	-
CMT-S	224x224	-	-	-	83.5%	-
CMT-L	256x256	-	-	-	84.5%	-

6. TODO

Other result may comming sonn if someone need.
Release the CMT-XS result on the imagenet.
Check the diff with papers, author give the hyparameters on the issue
Adjusting the best hyperparameters for CMT or transformers

Supplementary

If you want to know more, I give the CMT explanation, as well as the tuning and training process on here.

CMT: Convolutional Neural Networks Meet Vision Transformers

Related tags

Overview

CMT: Convolutional Neural Networks Meet Vision Transformers

1. Introduction

2. Enveriments

3. DataSet

4. Training & Inference

5. Imagenet Result

6. TODO

Supplementary

Owner

FlyEgle

Detection of drones using their thermal signatures from thermal camera through YOLO-V3 based CNN with modifications to encapsulate drone motion

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

Code to reproduce the experiments in the paper "Transformer Based Multi-Source Domain Adaptation" (EMNLP 2020)

A library for uncertainty quantification based on PyTorch

Learning multiple gaits of quadruped robot using hierarchical reinforcement learning

Probabilistic Gradient Boosting Machines

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

SoK: Vehicle Orientation Representations for Deep Rotation Estimation

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Semantic Scholar's Author Disambiguation Algorithm & Evaluation Suite

AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

Generative Autoregressive, Normalized Flows, VAEs, Score-based models (GANVAS)

AFL binary instrumentation

Radar-to-Lidar: Heterogeneous Place Recognition via Joint Learning

OpenMMLab Model Deployment Toolset

Implementation of the state of the art beat-detection, downbeat-detection and tempo-estimation model

Scalable Graph Neural Networks for Heterogeneous Graphs

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

Revisting Open World Object Detection