CoaT: Co-Scale Conv-Attentional Image Transformers

Related tags

Deep LearningCoaT
Overview

CoaT: Co-Scale Conv-Attentional Image Transformers

Introduction

This repository contains the official code and pretrained models for CoaT: Co-Scale Conv-Attentional Image Transformers. It introduces (1) a co-scale mechanism to realize fine-to-coarse, coarse-to-fine and cross-scale attention modeling and (2) an efficient conv-attention module to realize relative position encoding in the factorized attention.

Model Accuracy

For more details, please refer to CoaT: Co-Scale Conv-Attentional Image Transformers by Weijian Xu*, Yifan Xu*, Tyler Chang, and Zhuowen Tu.

Changelog

04/23/2021: Pre-trained checkpoint for CoaT-Lite Mini is released.
04/22/2021: Code and pre-trained checkpoint for CoaT-Lite Tiny are released.

Usage

Environment Preparation

  1. Set up a new conda environment and activate it.

    # Create an environment with Python 3.8.
    conda create -n coat python==3.8
    conda activate coat
  2. Install required packages.

    # Install PyTorch 1.7.1 w/ CUDA 11.0.
    pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
    
    # Install timm 0.3.2.
    pip install timm==0.3.2
    
    # Install einops.
    pip install einops

Code and Dataset Preparation

  1. Clone the repo.

    git clone https://github.com/mlpc-ucsd/CoaT
    cd CoaT
  2. Download ImageNet dataset (ILSVRC 2012) and extract.

    # Create dataset folder.
    mkdir -p ./data/ImageNet
    
    # Download the dataset (not shown here) and copy the files (assume the download path is in $DATASET_PATH).
    cp $DATASET_PATH/ILSVRC2012_img_train.tar $DATASET_PATH/ILSVRC2012_img_val.tar $DATASET_PATH/ILSVRC2012_devkit_t12.tar.gz ./data/ImageNet
    
    # Extract the dataset.
    python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='train')"
    python -c "from torchvision.datasets import ImageNet; ImageNet('./data/ImageNet', split='val')"
    # After the extraction, you should observe `train` and `val` folders under ./data/ImageNet.

Evaluate Pre-trained Checkpoint

We provide the CoaT checkpoints pre-trained on the ImageNet dataset.

Name [email protected] [email protected] #Params SHA-256 (first 8 chars) URL
CoaT-Lite Tiny 77.5 93.8 5.7M e88e96b0 model, log
CoaT-Lite Mini 79.1 94.5 11M 6b4a8ae5 model, log

The following commands provide an example (CoaT-Lite Tiny) to evaluate the pre-trained checkpoint.

# Download the pretrained checkpoint.
mkdir -p ./output/pretrained
wget http://vcl.ucsd.edu/coat/pretrained/coat_lite_tiny_e88e96b0.pth -P ./output/pretrained
sha256sum ./output/pretrained/coat_lite_tiny_e88e96b0.pth  # Make sure it matches the SHA-256 hash (first 8 characters) in the table.

# Evaluate.
# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_pretrained ./output/pretrained/coat_lite_tiny_e88e96b0.pth
# It should output results similar to "[email protected] 77.504 [email protected] 93.814" at very last.

Train

The following commands provide an example (CoaT-Lite Tiny, 8-GPU) to train the CoaT model.

# Usage: bash ./scripts/train.sh [model name] [output folder]
bash ./scripts/train.sh coat_lite_tiny coat_lite_tiny

Evaluate

The following commands provide an example (CoaT-Lite Tiny) to evaluate the checkpoint after training.

# Usage: bash ./scripts/eval.sh [model name] [output folder] [checkpoint path]
bash ./scripts/eval.sh coat_lite_tiny coat_lite_tiny_eval ./output/coat_lite_tiny/checkpoints/checkpoint0299.pth

Citation

@misc{xu2021coscale,
      title={Co-Scale Conv-Attentional Image Transformers}, 
      author={Weijian Xu and Yifan Xu and Tyler Chang and Zhuowen Tu},
      year={2021},
      eprint={2104.06399},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

This repository is released under the Apache License 2.0. License can be found in LICENSE file.

Acknowledgment

Thanks to DeiT and pytorch-image-models for a clear and data-efficient implementation of ViT. Thanks to lucidrains' implementation of Lambda Networks and CPVT.

Owner
mlpc-ucsd
mlpc-ucsd
GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation. (CVPR 2021)

GDR-Net This repo provides the PyTorch implementation of the work: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji. GDR-Net: Geometry-Guided

169 Jan 07, 2023
Boost learning for GNNs from the graph structure under challenging heterophily settings. (NeurIPS'20)

Beyond Homophily in Graph Neural Networks: Current Limitations and Effective Designs Jiong Zhu, Yujun Yan, Lingxiao Zhao, Mark Heimann, Leman Akoglu,

GEMS Lab: Graph Exploration & Mining at Scale, University of Michigan 70 Dec 18, 2022
Contrastive Feature Loss for Image Prediction

Contrastive Feature Loss for Image Prediction We provide a PyTorch implementation of our contrastive feature loss presented in: Contrastive Feature Lo

Alex Andonian 44 Oct 05, 2022
CAUSE: Causality from AttribUtions on Sequence of Events

CAUSE: Causality from AttribUtions on Sequence of Events

Wei Zhang 21 Dec 01, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation.

SAFA: Structure Aware Face Animation (3DV2021) Official Pytorch Implementation of 3DV2021 paper: SAFA: Structure Aware Face Animation. Getting Started

QiulinW 122 Dec 23, 2022
Lecture materials for Cornell CS5785 Applied Machine Learning (Fall 2021)

Applied Machine Learning (Cornell CS5785, Fall 2021) This repo contains executable course notes and slides for the Applied ML course at Cornell and Co

Volodymyr Kuleshov 103 Dec 31, 2022
A collection of loss functions for medical image segmentation

A collection of loss functions for medical image segmentation

Jun 3.1k Jan 03, 2023
Code to produce syntactic representations that can be used to study syntax processing in the human brain

Can fMRI reveal the representation of syntactic structure in the brain? The code base for our paper on understanding syntactic representations in the

Aniketh Janardhan Reddy 4 Dec 18, 2022
Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Learning What To Do by Simulating the Past This repository contains code that implements the Deep Reward Learning by Simulating the Past (Deep RSLP) a

Center for Human-Compatible AI 24 Aug 07, 2021
PerfFuzz: Automatically Generate Pathological Inputs for C/C++ programs

PerfFuzz Performance problems in software can arise unexpectedly when programs are provided with inputs that exhibit pathological behavior. But how ca

Caroline Lemieux 125 Nov 18, 2022
A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

yolov5-helmet-detection-python A Python implementation of Yolov5 to detect head or helmet in the wild in Jetson Xavier nx and Jetson nano. In Jetson X

12 Dec 05, 2022
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
MADE (Masked Autoencoder Density Estimation) implementation in PyTorch

pytorch-made This code is an implementation of "Masked AutoEncoder for Density Estimation" by Germain et al., 2015. The core idea is that you can turn

Andrej 498 Dec 30, 2022
Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Keras implementation of Normalizer-Free Networks and SGD - Adaptive Gradient Clipping

Yam Peleg 63 Sep 21, 2022
Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition This is a Torch implementation of "Deep Residual Learning for Image Recognition",Kaiming He, Xiangyu Zhan

Kimmy 561 Dec 01, 2022
SMPL-X: A new joint 3D model of the human body, face and hands together

SMPL-X: A new joint 3D model of the human body, face and hands together [Paper Page] [Paper] [Supp. Mat.] Table of Contents License Description News I

Vassilis Choutas 1k Jan 09, 2023
Transformer Tracking (CVPR2021)

TransT - Transformer Tracking [CVPR2021] Official implementation of the TransT (CVPR2021) , including training code and trained models. We are revisin

chenxin 465 Jan 06, 2023
LibMTL: A PyTorch Library for Multi-Task Learning

LibMTL LibMTL is an open-source library built on PyTorch for Multi-Task Learning (MTL). See the latest documentation for detailed introductions and AP

765 Jan 06, 2023