Implementation of "A MLP-like Architecture for Dense Prediction"

Last update: Dec 27, 2022

Related tags

Deep Learning CycleMLP

Overview

A MLP-like Architecture for Dense Prediction (arXiv)

Updates

(22/07/2021) Initial release.

Model Zoo

We provide CycleMLP models pretrained on ImageNet 2012.

Model	Parameters	FLOPs	Top 1 Acc.	Download
CycleMLP-B1	15M	2.1G	78.9%	model
CycleMLP-B2	27M	3.9G	81.6%	model
CycleMLP-B3	38M	6.9G	82.4%	model
CycleMLP-B4	52M	10.1G	83.0%	model
CycleMLP-B5	76M	12.3G	83.2%	model

Usage

Install

PyTorch 1.7.0+ and torchvision 0.8.1+
timm:

pip install 'git+https://github.com/rwightman/[email protected]'

or

git clone https://github.com/rwightman/pytorch-image-models
cd pytorch-image-models
git checkout c2ba229d995c33aaaf20e00a5686b4dc857044be
pip install -e .

fvcore (optional, for FLOPs calculation)
mmcv, mmdetection, mmsegmentation (optional)

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is:

│path/to/imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

Evaluation

To evaluate a pre-trained CycleMLP-B5 on ImageNet val with a single GPU run:

python main.py --eval --model CycleMLP_B5 --resume path/to/CycleMLP_B5.pth --data-path /path/to/imagenet

Training

To train CycleMLP-B5 on ImageNet on a single node with 8 gpus for 300 epochs run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --model CycleMLP_B5 --batch-size 128 --data-path /path/to/imagenet --output_dir /path/to/save

Acknowledgement

This code is based on DeiT and pytorch-image-models. Thanks for their wonderful works

Citing

@article{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction},
  author={Chen, Shoufa and Xie, Enze and Ge, Chongjian and Liang, Ding and Luo, Ping},
  journal={arXiv preprint arXiv:2107.10224},
  year={2021}
}

License

CycleMLP is released under MIT License.

Comments

detection result

Applying PVT detection framework, I tried a CycleMLP-B1 based detector with RetinaNet 1x. I got AP=27.1, fairly inferior to the reported 38.6. Could you give some advices to reproduce the reported result?

The specific configure is as follows

base = [ 'base/models/retinanet_r50_fpn.py', 'base/datasets/coco_detection.py', 'base/schedules/schedule_1x.py', 'base/default_runtime.py' ] #optimizer model = dict( pretrained='./pretrained/CycleMLP_B1.pth', backbone=dict( type='CycleMLP_B1_feat', style='pytorch'), neck=dict( type='FPN', in_channels=[64, 128, 320, 512], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5)) #optimizer optimizer = dict(delete=True, type='AdamW', lr=0.0001, weight_decay=0.0001) optimizer_config = dict(grad_clip=None)

find_unused_parameters = True

opened by mountain111 6
Compiling CycleMLP

Thank you for this great repo and interesting paper.

I tried compiling CycleMLP to onnx and not surpassingly the process failed since CycleMLP include dynamic offset creation in https://github.com/ShoufaChen/CycleMLP/blob/main/cycle_mlp.py#L132 and as such cannot be converted to a frozen graph. Were you able to convert CycleMLP to onnx or any other frozen graph framework?

Thanks in advance.

opened by shairoz-deci 6
Questions about offset calculation

Hi, thanks for your wonderful work.

I'm currently studying your work, and come up with some question about the offset calculations.

I understood the offset calculation mentioned on the paper, but can't understand about how generated offset is being used in the code.

For ex) if $S_H \times S_W : 3 \times 1$; I understood how the offset is applied in this figure

by calculate like this:

However, when I run the offset generating code, I can't figure out how this offset is being used in deform_conv2d

Can you provide more detailed information about this??

And also, the paper contains how $S_H \times S_W: 3 \times 3$ works, but in the code, it seems like either one ofkernel_size[0] or kernel_size[1] has to be 1. So, if I want to use $S_H \times S_W : 3 \times 3$, do I have to make $3 \times 1$ and $1 \times 3$ offsets and add those together?

Thank you again for your work. I really learned a lot.

opened by tae-mo 5
Example of CycleMLP Configuration for Dense Prediction

Hello.

First of all, thank you for curating this interesting work. I was wondering, are there any working examples of how I can use CycleMLP for dense prediction while maintaining the original input size (e.g., predict a 0 or 1 value for each pixel in an input image)? In addition, I am interested in only a single ("annotated") output image, although I noticed the model definitions given in this repository output multiple downsampled versions of the original input image. Any thoughts on this?

Thank you in advance for your time.

opened by amorehead 2
Swin-B vs CycleMLP-B on image classification

For classificaion on ImageNet-1k, the acuracy of Swin-B is 83.5, which is 0.1 higher than the proposed CycleMLP-B. But, in this paper, the authors reprot that the accuracy of Swin-B is 83.3, which is 0.1 lower than the proposed CycleMLP-B. Why are these accuracies different?

opened by hkzhang91 1

question about the offset

Thanks for your work!

The implementation of this code inspired me. But the calculation of offset here is confusing. Although this issue (https://github.com/ShoufaChen/CycleMLP/issues/10) has asked similar questions, I haven't found a reasonable explanation.

https://github.com/ShoufaChen/CycleMLP/blob/2f76a1f6e3cc6672143fdac46e3db5f9a7341253/cycle_mlp.py#L127-L136

kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
offset.reshape(num_channels, 2)

tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])

the results are different with the figure in paper:

Some codes for verification:

import torch
from torchvision.ops import deform_conv2d

num_channels = 6

data = torch.arange(1, 6).reshape(1, 1, 1, 5).expand(-1, num_channels, -1, -1)
data
"""
tensor([[[[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]],
         [[1, 2, 3, 4, 5]]]])
"""

weight = torch.eye(num_channels).reshape(num_channels, num_channels, 1, 1)
weight.reshape(num_channels, num_channels)
"""
tensor([[1., 0., 0., 0., 0., 0.],
        [0., 1., 0., 0., 0., 0.],
        [0., 0., 1., 0., 0., 0.],
        [0., 0., 0., 1., 0., 0.],
        [0., 0., 0., 0., 1., 0.],
        [0., 0., 0., 0., 0., 1.]])
"""

offset = torch.empty(1, 2 * num_channels * 1 * 1, 1, 1)
kernel_size = (1, 3)
start_idx = (kernel_size[0] * kernel_size[1]) // 2
for i in range(num_channels):
    offset[0, 2 * i + 0, 0, 0] = 0
    # relative offset
    offset[0, 2 * i + 1, 0, 0] = (
        (i + start_idx) % kernel_size[1] - (kernel_size[1] // 2)
    )
offset.reshape(num_channels, 2)
"""
tensor([[ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.],
        [ 0.,  0.],
        [ 0.,  1.],
        [ 0., -1.]])
"""

deform_conv2d(
    data.float(), 
    offset=offset.expand(-1, -1, -1, 5).float(), 
    weight=weight.float(), 
    bias=None,
)
"""
tensor([[[[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]],
         [[1., 2., 3., 4., 5.]],
         [[2., 3., 4., 5., 0.]],
         [[0., 1., 2., 3., 4.]]]])
"""

opened by lartpang 1

question about the offset

Hi, thank you very much for your excellent work. In Fig.4 of your paper, you show the pseudo-kernel when kernel size is 1x3. But I when I find that function "gen_offset" does not generate the same offset as Fig.4. The offset it generates is "0,1,0,-1,0,0,0,1..." instead of "0,1,0,-1,0,1,0,-1', which is shown in Fig.4. So could you please tell me the reason?

opened by linjing7 1
About "crop_pct"

Hi, thanks for your great work and code. I wonder the parameter crop_pct actually works in which part of code. When I go throught the timm, I can't find out how this crop_pct is loaded.

opened by ggjy 1
How to deploy CycleMLP-T for training？

Thank you very much for such a wonderful work!

After learning the cycle_mlp source code in the repository, I am very confused to deploy CycleMLP Block based on Swin Transformer. Is it convenient for you to release swin-based CycleMLP? Looking forward to your reply, Thanks!

opened by Pak287 0

Releases(v0.1)

v0.1(Jul 21, 2021)

Source code(tar.gz)
Source code(zip)
CycleMLP_B1.pth(57.92 MB)
CycleMLP_B2.pth(102.45 MB)
CycleMLP_B3.pth(146.75 MB)
CycleMLP_B4.pth(198.05 MB)
CycleMLP_B5.pth(289.38 MB)
CycleMLP_base.pth(335.13 MB)
CycleMLP_small.pth(189.52 MB)
CycleMLP_tiny.pth(108.05 MB)

Owner

Shoufa Chen

GitHub Repository

The backbone CSPDarkNet of YOLOX.

YOLOX-Backbone The backbone CSPDarkNet of YOLOX. In this project, you can enjoy: CSPDarkNet-S CSPDarkNet-M CSPDarkNet-L CSPDarkNet-X CSPDarkNet-Tiny C

9 Aug 22, 2022

A curated (most recent) list of resources for Learning with Noisy Labels

321 Jan 09, 2023

code for ICCV 2021 paper 'Generalized Source-free Domain Adaptation'

G-SFDA Code (based on pytorch 1.3) for our ICCV 2021 paper 'Generalized Source-free Domain Adaptation'. [project] [paper]. Dataset preparing Download

84 Dec 26, 2022

Multi-Content GAN for Few-Shot Font Style Transfer at CVPR 2018

MC-GAN in PyTorch This is the implementation of the Multi-Content GAN for Few-Shot Font Style Transfer. The code was written by Samaneh Azadi. If you

422 Dec 04, 2022

An SE(3)-invariant autoencoder for generating the periodic structure of materials

Crystal Diffusion Variational AutoEncoder This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic st

94 Dec 10, 2022

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding (AAAI 2020) - PyTorch Implementation

Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding PyTorch implementation for the Scalable Attentive Sentence-Pair Modeling vi

25 Dec 02, 2022

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021)

Beyond Image to Depth: Improving Depth Prediction using Echoes (CVPR 2021) Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma. We address the pr

33 Jun 27, 2022

From a body shape, infer the anatomic skeleton.

OSSO: Obtaining Skeletal Shape from Outside (CVPR 2022) This repository contains the official implementation of the skeleton inference from: OSSO: Obt

166 Dec 28, 2022

Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

11 Aug 29, 2022

Parsing, analyzing, and comparing source code across many languages

Semantic semantic is a Haskell library and command line tool for parsing, analyzing, and comparing source code. In a hurry? Check out our documentatio

8.6k Dec 28, 2022

ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Contributors of this repo: Zhibo Zhang ( Zhibo (Darren) Zhang 18 Nov 01, 2022

Tiny-NewsRec: Efﬁcient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efﬁcient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

3 Dec 07, 2022

Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

160 Dec 21, 2022

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

TFLite-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite. Stereo depth estimati

4 Feb 14, 2022

FinRL-Meta: A Universe for Data-Driven Financial Reinforcement Learning. 🔥

FinRL-Meta: A Universe of Market Environments. FinRL-Meta is a universe of market environments for data-driven financial reinforcement learning. Users

543 Jan 08, 2023

Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"

This is the codebase for the paper: Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs Directory Structur

19 Aug 21, 2022

Trafffic prediction analysis using hybrid models - Machine Learning

Hybrid Machine learning Model Clone the Repository Create a new Directory as assests and download the model from the below link Model Link To Start th

1 Feb 08, 2022

Manim is an engine for precise programmatic animations, designed for creating explanatory math videos

Manim is an engine for precise programmatic animations, designed for creating explanatory math videos. Note, there are two versions of manim. This rep

49k Jan 09, 2023

Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-

13 Sep 15, 2022

Convert dog pictures into various painting styles. Try LimnPet

LimnPet Cartoon stylization service project Try our service » Home page · Team notion · Members 목차 프로젝트 소개 프로젝트 목표 사용한 기술스택과 수행도구 팀원 구현 기능 주요 기능 추가 기능

7 Jul 14, 2022