This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Overview

Patches Are All You Need? 🤷

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Code overview

The most important code is in convmixer.py. We trained ConvMixers using the timm framework, which we copied from here.

Update: ConvMixer is now integrated into the timm framework itself. You can see the PR here.

Inside pytorch-image-models, we have made the following modifications. (Though one could look at the diff, we think it is convenient to summarize them here.)

  • Added ConvMixers
    • added timm/models/convmixer.py
    • modified timm/models/__init__.py
  • Added "OneCycle" LR Schedule
    • added timm/scheduler/onecycle_lr.py
    • modified timm/scheduler/scheduler.py
    • modified timm/scheduler/scheduler_factory.py
    • modified timm/scheduler/__init__.py
    • modified train.py (added two lines to support this LR schedule)

We are confident that the use of the OneCycle schedule here is not critical, and one could likely just as well train ConvMixers with the built-in cosine schedule.

Evaluation

We provide some model weights below:

Model Name Kernel Size Patch Size File Size
ConvMixer-1536/20 9 7 207MB
ConvMixer-768/32* 7 7 85MB
ConvMixer-1024/20 9 14 98MB

* Important: ConvMixer-768/32 here uses ReLU instead of GELU, so you would have to change convmixer.py accordingly (we will fix this later).

You can evaluate ConvMixer-1536/20 as follows:

python validate.py --model convmixer_1536_20 --b 64 --num-classes 1000 --checkpoint [/path/to/convmixer_1536_20_ks9_p7.pth.tar] [/path/to/ImageNet1k-val]

You should get a 81.37% accuracy.

Training

If you had a node with 10 GPUs, you could train a ConvMixer-1536/20 as follows (these are exactly the settings we used):

sh distributed_train.sh 10 [/path/to/ImageNet1k] 
    --train-split [your_train_dir] 
    --val-split [your_val_dir] 
    --model convmixer_1536_20 
    -b 64 
    -j 10 
    --opt adamw 
    --epochs 150 
    --sched onecycle 
    --amp 
    --input-size 3 224 224
    --lr 0.01 
    --aa rand-m9-mstd0.5-inc1 
    --cutmix 0.5 
    --mixup 0.5 
    --reprob 0.25 
    --remode pixel 
    --num-classes 1000 
    --warmup-epochs 0 
    --opt-eps=1e-3 
    --clip-grad 1.0

We also included a ConvMixer-768/32 in timm/models/convmixer.py (though it is simple to add more ConvMixers). We trained that one with the above settings but with 300 epochs instead of 150 epochs.

In the near future, we will upload weights.

The tweetable version of ConvMixer, which requires from torch.nn import *:

def ConvMixr(h,d,k,p,n):
 S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h))
 R=type('',(S,),{'forward':lambda s,x:s[0](x)+x})
 return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,groups=h,padding=k//2))),A(C(h,h,1))) for i in range(d)],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
Comments
  • Cifar10 baseline doesn't reach 95%

    Cifar10 baseline doesn't reach 95%

    Hello, I tried convmixer256 on Cifar-10 with the same timm options specified for ImageNet (except the num_classes) and it doesn't go beyond 90% accuracy. Could you please specify the options used for Cifar-10 experiment ?

    opened by K-H-Ismail 13
  • What's new about this model?

    What's new about this model?

    Why “patches” are all you need? Patch embedding is Conv7x7 stem, The body is simply repeated Conv9x9 + Conv1x1, (Not challenging your work, it's indeed very interesting), but just kindly wondering what's new about this model?

    opened by vztu 5
  • Training scheme modifications for small GPUs

    Training scheme modifications for small GPUs

    Hi authors. Your paper has demonstrated a quite intriguing observation. I wish you luck with your submission. Thanks for sharing the code of the submission. When running the code, I got an issue regarding OOM when using the default batch size of 64. In the end I can only run with 8 samples per batch per GPU as my GPUs have only 11GB. I would like to know if you have tried smaller GPUs and achieved the same results. So far, besides learning rate modified according to the linear rule, I haven't made any change yet. If you tried training using smaller GPUs before, could you please share your experience? Thank you very much!

    opened by justanhduc 4
  • Experiments with full convolutional layers instead of patch embedding?

    Experiments with full convolutional layers instead of patch embedding?

    Have the author tried to replace the patch embedding with the just convolution?That is, using 1 stride instead of p?

    With this setting, this is a standard convolution network like MobileNet. I wonder what would be the performance?Is the performance gain of Convmix due to the patch embedding or the depthwise conv layers?

    Very interested in this work, thanks.

    opened by forjiuzhou 2
  • Training time

    Training time

    Hi, first of all thanks for a very interesting paper.

    I would like to know how long did it take you to train the models? I'm trying to train ConvMixer-768/32 using 2xV100 and one epoch is ~3 hours, so I would estimate that full training would take ~= 2 * 3 * 300 ~= 1800 GPU hours, which is insane. Even if you trained with 10 GPUs it would take ~1 week for one experiment to finish. Are my calculations correct?

    opened by bonlime 1
  • padding=same?

    padding=same?

    https://github.com/tmp-iclr/convmixer/blob/1cefd860a1a6a85369887d1a633425cedc2afd0a/convmixer.py#L18 There is an error:TypeError: conv2d(): argument 'padding' (position 5) must be tuple of ints, not str.

    opened by linhaoqi027 1
  • Add Docker environment & web demo

    Add Docker environment & web demo

    Hey @ashertrockman, @tmp-iclr ! wave

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

    By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. blush

    opened by ariel415el 0
  • Add Docker environment & web demo

    Add Docker environment & web demo

    Hey @ashertrockman, @tmp-iclr ! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

    By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by ariel415el 0
  • Fix notebooks

    Fix notebooks

    Hi.

    Fixed errors in pytorch-image-models/notebooks/{EffResNetComparison,GeneralizationToImageNetV2}.ipynb notebooks:

    • added missed pynvml installation;
    • resolved missed imports;
    • resolved errors due to outdated calls of timm library.

    Tested in colab env: "Run all" without any errors.

    opened by amrzv 0
  • CIFAR-10 training settings

    CIFAR-10 training settings

    First of all, thank you for the interesting work. I was experimenting the one with patch size 1 and kernel size 9 with CIFAR-10 with the following training settings:

    --model tiny_convmixer
     -b 64 -j 8 
    --opt adamw 
    --epochs 200 
    --sched onecycle 
    --amp 
    --input-size 3 32 32 
    --lr 0.01 
    --aa rand-m9-mstd0.5-inc1 
    --cutmix 0.5 
    --mixup 0.5 
    --reprob 0.25 
    --remode pixel 
    --num-classes 10
    --warmup-epochs 0
    --opt-eps 1e-3
    --clip-grad 1.0
    --scale 0.75 1.0
    --weight-decay 0.01
    --mean 0.4914 0.4822 0.4465
    --std 0.2471 0.2435 0.2616
    

    I could get only 95.89%. I am supposed to get 96.03% according to Table 4 in the paper. Can you please let me know any setting I missed? Thank you again.

    opened by fugokidi 0
  • Segmentation ConvMixer architecture ?

    Segmentation ConvMixer architecture ?

    I was trying to figure what a Segmentation ConvMixer would look like, and came up with that (residual connection inspired by MultiResUNet). Does it make sense to you ?

    image

    opened by divideconcept 0
  • Request more experiment results to compare to other architecture.

    Request more experiment results to compare to other architecture.

    Hi! This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.

    opened by LuoXin-s 1
Releases(timm-v1.0)
Owner
ICLR 2022 Author
Patches Are All You Need? 🤷
ICLR 2022 Author
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

Stanford Future Data Systems 22 Dec 05, 2022
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
Face recognize and crop them

Face Recognize Cropping Module Source 아이디어 Face Alignment with OpenCV and Python Requirement 필요 라이브러리 imutil dlib python-opence (cv2) Usage 사용 방법 open

Cho Moon Gi 1 Feb 15, 2022
A Partition Filter Network for Joint Entity and Relation Extraction EMNLP 2021

EMNLP 2021 - A Partition Filter Network for Joint Entity and Relation Extraction

zhy 127 Jan 04, 2023
[ACMMM 2021 Oral] Enhanced Invertible Encoding for Learned Image Compression

InvCompress Official Pytorch Implementation for "Enhanced Invertible Encoding for Learned Image Compression", ACMMM 2021 (Oral) Figure: Our framework

96 Nov 30, 2022
This project is used for the paper Differentiable Programming of Isometric Tensor Network

This project is used for the paper "Differentiable Programming of Isometric Tensor Network". (arXiv:2110.03898)

Chenhua Geng 15 Dec 13, 2022
PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

Soohwan Kim 565 Jan 04, 2023
A library for researching neural networks compression and acceleration methods.

A library for researching neural networks compression and acceleration methods.

Intel Labs 100 Dec 29, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
DuBE: Duple-balanced Ensemble Learning from Skewed Data

DuBE: Duple-balanced Ensemble Learning from Skewed Data "Towards Inter-class and Intra-class Imbalance in Class-imbalanced Learning" (IEEE ICDE 2022 S

6 Nov 12, 2022
GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data

GeneGAN: Learning Object Transfiguration and Attribute Subspace from Unpaired Data By Shuchang Zhou, Taihong Xiao, Yi Yang, Dieqiao Feng, Qinyao He, W

Taihong Xiao 141 Apr 16, 2021
Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

Curso em Vídeo - Exercícios de Python 3 Sobre o repositório Este repositório contém os notebooks com a listas de exercícios propostos pelo professor G

João Pedro Pereira 9 Oct 15, 2022
Simple-Image-Classification - Simple Image Classification Code (PyTorch)

Simple-Image-Classification Simple Image Classification Code (PyTorch) Yechan Kim This repository contains: Python3 / Pytorch code for multi-class ima

Yechan Kim 8 Oct 29, 2022
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

45 Nov 29, 2022
Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes / 3DCrowdNet News 💪 3DCrowdNet achieves the state-of-the-art accuracy on 3D

Hongsuk Choi 113 Dec 21, 2022
Official Repository for Machine Learning class - Physics Without Frontiers 2021

PWF 2021 Física Sin Fronteras es un proyecto del Centro Internacional de Física Teórica (ICTP) en Trieste Italia. El ICTP es un centro dedicado a fome

36 Aug 06, 2022
Fast and simple implementation of RL algorithms, designed to run fully on GPU.

RSL RL Fast and simple implementation of RL algorithms, designed to run fully on GPU. This code is an evolution of rl-pytorch provided with NVIDIA's I

Robotic Systems Lab - Legged Robotics at ETH Zürich 68 Dec 29, 2022
🐾 Semantic segmentation of paws from cute pet images (PyTorch)

🐾 paw-segmentation 🐾 Semantic segmentation of paws from cute pet images 🐾 Semantic segmentation of paws from cute pet images (PyTorch) 🐾 Paw Segme

Zabir Al Nazi Nabil 3 Feb 01, 2022
September-Assistant - Open-source Windows Voice Assistant

September - Windows Assistant September is an open-source Windows personal assis

The Nithin Balaji 9 Nov 22, 2022
This repo contains the official implementations of EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis

EigenDamage: Structured Pruning in the Kronecker-Factored Eigenbasis This repo contains the official implementations of EigenDamage: Structured Prunin

Chaoqi Wang 107 Apr 20, 2022