A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Last update: Oct 28, 2022

Related tags

Deep Learning GFNet-Pytorch

Overview

GFNet-Pytorch (NeurIPS 2020)

This repo contains the official code and pre-trained models for the glance and focus network (GFNet).

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classiﬁcation

Citation

@inproceedings{NeurIPS2020_7866,
        title = {Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification},
       author = {Wang, Yulin and Lv, Kangchen and Huang, Rui and Song, Shiji and Yang, Le and Huang, Gao},
    booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
         year = {2020},
}

Update on 2020/10/08: Release Pre-trained Models and the Inference Code on ImageNet.

Update on 2020/12/28: Release Training Code.

Introduction

Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efﬁcient image classiﬁcation by processing a sequence of relatively small inputs, which are strategically cropped from the original image. Experiments on ImageNet show that our method consistently improves the computational efﬁciency of a wide variety of deep models. For example, it further reduces the average latency of the highly efﬁcient MobileNet-V3 on an iPhone XS Max by 20% without sacriﬁcing accuracy.

Results

Top-1 accuracy on ImageNet v.s. Multiply-Adds

Top-1 accuracy on ImageNet v.s. Inference Latency (ms) on an iPhone XS Max

Visualization

Pre-trained Models

Backbone CNNs	Patch Size	T	Links
ResNet-50	96x96	5	Tsinghua Cloud / Google Drive
ResNet-50	128x128	5	Tsinghua Cloud / Google Drive
DenseNet-121	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-169	96x96	5	Tsinghua Cloud / Google Drive
DenseNet-201	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-600MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-800MF	96x96	5	Tsinghua Cloud / Google Drive
RegNet-Y-1.6GF	96x96	5	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	96x96	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.00)	128x128	3	Tsinghua Cloud / Google Drive
MobileNet-V3-Large (1.25)	128x128	3	Tsinghua Cloud / Google Drive
EfﬁcientNet-B2	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	128x128	4	Tsinghua Cloud / Google Drive
EfﬁcientNet-B3	144x144	4	Tsinghua Cloud / Google Drive

What are contained in the checkpoints:

**.pth.tar
├── model_name: name of the backbone CNNs (e.g., resnet50, densenet121)
├── patch_size: size of image patches (i.e., H' or W' in the paper)
├── model_prime_state_dict, model_state_dict, fc, policy: state dictionaries of the four components of GFNets
├── model_flops, policy_flops, fc_flops: Multiply-Adds of inferring the encoder, patch proposal network and classifier for once
├── flops: a list containing the Multiply-Adds corresponding to each length of the input sequence during inference
├── anytime_classification: results of anytime prediction (in Top-1 accuracy)
├── dynamic_threshold: the confidence thresholds used in budgeted batch classification
├── budgeted_batch_classification: results of budgeted batch classification (a two-item list, [0] and [1] correspond to the two coordinates of a curve)

Requirements

python 3.7.7
pytorch 1.3.1
torchvision 0.4.2
pyyaml 5.3.1 (for RegNets)

Evaluate Pre-trained Models

Read the evaluation results saved in pre-trained models

CUDA_VISIBLE_DEVICES=0 python inference.py --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 0

Read the confidence thresholds saved in pre-trained models and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 1

Determine confidence thresholds on the training set and infer the model on the validation set

CUDA_VISIBLE_DEVICES=0 python inference.py --data_url PATH_TO_DATASET --checkpoint_path PATH_TO_CHECKPOINTS  --eval_mode 2

The dataset is expected to be prepared as follows:

ImageNet
├── train
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...
├── val
│   ├── folder 1 (class 1)
│   ├── folder 2 (class 1)
│   ├── ...

Training

Here we take training ResNet-50 (96x96, T=5) for example. All the used initialization models and stage-1/2 checkpoints can be found in Tsinghua Cloud / Google Drive. Currently, this link includes ResNet and MobileNet-V3. We will update it as soon as possible. If you need other helps, feel free to contact us.
The Results in the paper is based on 2 Tesla V100 GPUs. For most of experiments, up to 4 Titan Xp GPUs may be enough.

Training stage 1, the initializations of global encoder (model_prime) and local encoder (model) are required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 1 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --model_prime_path PATH_TO_CHECKPOINTS  --model_path PATH_TO_CHECKPOINTS

Training stage 2, a stage-1 checkpoint is required:

CUDA_VISIBLE_DEVICES=0 python train.py --data_url PATH_TO_DATASET --train_stage 2 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Training stage 3, a stage-2 checkpoint is required:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --data_url PATH_TO_DATASET --train_stage 3 --model_arch resnet50 --patch_size 96 --T 5 --print_freq 10 --checkpoint_path PATH_TO_CHECKPOINTS

Contact

If you have any question, please feel free to contact the authors. Yulin Wang: [email protected].

Acknowledgment

Our code of MobileNet-V3 and EfficientNet is from here. Our code of RegNet is from here.

To Do

Update the code for visualizing.
Update the code for MIXED PRECISION TRAINING。

A general framework for inferring CNNs efficiently. Reduce the inference latency of MobileNet-V3 by 1.3x on an iPhone XS Max without sacrificing accuracy.

Related tags

Overview

GFNet-Pytorch (NeurIPS 2020)

Introduction

Results

Pre-trained Models

Requirements

Evaluate Pre-trained Models

Training

Contact

Acknowledgment

To Do

Owner

Rainforest Wang

Accelerate Neural Net Training by Progressively Freezing Layers

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Bayesian Deep Learning and Deep Reinforcement Learning for Object Shape Error Response and Correction of Manufacturing Systems

Source code of the paper Meta-learning with an Adaptive Task Scheduler.

上海交通大学全自动抢课脚本，支持准点开抢与抢课后持续捡漏两种模式。2021/06/08更新。

Spectralformer: Rethinking hyperspectral image classification with transformers

This is the pytorch code for the paper Curious Representation Learning for Embodied Intelligence.

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

mmfewshot is an open source few shot learning toolbox based on PyTorch

Gray Zone Assessment

Non-Attentive-Tacotron - This is Pytorch Implementation of Google's Non-attentive Tacotron.

Least Square Calibration for Peer Reviews

[CVPR 2020] GAN Compression: Efficient Architectures for Interactive Conditional GANs

Practical Single-Image Super-Resolution Using Look-Up Table

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

BOVText: A Large-Scale, Multidimensional Multilingual Dataset for Video Text Spotting

Yolo object detection - Yolo object detection with python

Improving the robustness and performance of biomedical NLP models through adversarial training

Release of the ConditionalQA dataset

kullanışlı ve işinizi kolaylaştıracak bir araç