SOTA model in CIFAR10

Overview

A PyTorch Implementation of CIFAR Tricks

调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。

0. Requirements

  • Python 3.6+
  • torch=1.8.0+cu111
  • torchvision+0.9.0+cu111
  • tqdm=4.26.0
  • PyYAML=6.0

1. Implements

1.1 Tricks

  • Warmup
  • Cosine LR Decay
  • SAM
  • Label Smooth
  • KD
  • Adabound
  • Xavier Kaiming init
  • lr finder

1.2 Augmentation

  • Auto Augmentation
  • Cutout
  • Mixup
  • RICAP
  • Random Erase
  • ShakeDrop

2. Training

2.1 CIFAR-10训练示例

WideResNet28-10 baseline on CIFAR-10:

python train.py --dataset cifar10

WideResNet28-10 +RICAP on CIFAR-10:

python train.py --dataset cifar10 --ricap True

WideResNet28-10 +Random Erasing on CIFAR-10:

python train.py --dataset cifar10 --random-erase True

WideResNet28-10 +Mixup on CIFAR-10:

python train.py --dataset cifar10 --mixup True

3. Results

3.1 原pytorch-ricap的结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.82(96.18) 0.158 3.89
WideResNet28-10 +RICAP 2.82(97.18) 0.141 2.85
WideResNet28-10 +Random Erasing 3.18(96.82) 0.114 4.65
WideResNet28-10 +Mixup 3.02(96.98) 0.158 3.02

3.2 Reimplementation结果

Model Error rate Loss Error rate (paper)
WideResNet28-10 baseline 3.78(96.22) 3.89
WideResNet28-10 +RICAP 2.81(97.19) 2.85
WideResNet28-10 +Random Erasing 3.03(96.97) 0.113 4.65
WideResNet28-10 +Mixup 2.93(97.07) 0.158 3.02

3.3 Half data快速训练验证各网络结构

reimplementation models(no augmentation, half data,epoch200,bs128)

Model Error rate Loss
lenet(cpu爆炸) (70.76)
wideresnet 3.78(96.22)
resnet20 (89.72)
senet (92.34)
resnet18 (92.08)
resnet34 (92.48)
resnet50 (91.72)
regnet (92.58)
nasnet out of mem
shake_resnet26_2x32d (93.06)
shake_resnet26_2x64d (94.14)
densenet (92.06)
dla (92.58)
googlenet (91.90) 0.2675
efficientnetb0(利用率低且慢) (86.82) 0.5024
mobilenet(利用率低) (89.18)
mobilenetv2 (91.06)
pnasnet (90.44)
preact_resnet (90.76)
resnext (92.30)
vgg(cpugpu利用率都高) (88.38)
inceptionv3 (91.84)
inceptionv4 (91.10)
inception_resnet_v2 (83.46)
rir (92.34) 0.3932
squeezenet(CPU利用率高) (89.16) 0.4311
stochastic_depth_resnet18 (90.22)
xception
dpn (92.06) 0.3002
ge_resnext29_8x64d (93.86) 巨慢

3.4 测试cpu gpu影响

TEST: scale/kernel ToyNet

修改网络的卷积层深度,并进行训练,可以得到以下结论:

结论:lenet这种卷积量比较少,只有两层的,cpu利用率高,gpu利用率低。在这个基础上增加深度,用vgg那种直筒方式增加深度,发现深度越深,cpu利用率越低,gpu利用率越高。

修改训练过程的batch size,可以得到以下结论:

结论:bs会影响收敛效果。

3.5 StepLR优化下测试cutout和mixup

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 200 96.33
shake_resnet26_2x64d 200 96.99
shake_resnet26_2x64d 200 96.60
shake_resnet26_2x64d 200 96.46

3.6 测试SAM,ASAM,Cosine,LabelSmooth

architecture epoch SAM ASAM Cosine LR Decay LabelSmooth C10 test acc (%)
shake_resnet26_2x64d 200 96.51
shake_resnet26_2x64d 200 96.80
shake_resnet26_2x64d 200 96.61
shake_resnet26_2x64d 200 96.57

PS:其他库在加长训练过程(epoch=1800)情况下可以实现 shake_resnet26_2x64d achieved 97.71% test accuracy with cutout and mixup!!

3.7 测试cosine lr + shake

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 300 96.66
shake_resnet26_2x64d 300 97.21
shake_resnet26_2x64d 300 96.90
shake_resnet26_2x64d 300 96.73

1800 epoch CIFAR ZOO中结果,由于耗时过久,未进行复现。

architecture epoch cutout mixup C10 test acc (%)
shake_resnet26_2x64d 1800 96.94(cifar zoo)
shake_resnet26_2x64d 1800 97.20(cifar zoo)
shake_resnet26_2x64d 1800 97.42(cifar zoo)
shake_resnet26_2x64d 1800 97.71(cifar zoo)

3.8 Divide and Co-training方案研究

  • lr:
    • warmup (20 epoch)
    • cosine lr decay
    • lr=0.1
    • total epoch(300 epoch)
  • bs=128
  • aug:
    • Random Crop and resize
    • Random left-right flipping
    • AutoAugment
    • Normalization
    • Random Erasing
    • Mixup
  • weight decay=5e-4 (bias and bn undecayed)
  • kaiming weight init
  • optimizer: nesterov

复现:((v100:gpu1) 4min*300/60=20h) top1: 97.59% 本项目目前最高值。

python train.py --model 'pyramidnet272' \
                --name 'divide-co-train' \
                --autoaugmentation True \ 
                --random-erase True \
                --mixup True \
                --epochs 300 \
                --sched 'warmcosine' \
                --optims 'nesterov' \
                --bs 128 \
                --root '/home/dpj/project/data'

3.9 测试多种数据增强

architecture epoch cutout mixup autoaugment random-erase C10 test acc (%)
shake_resnet26_2x64d 200 96.42
shake_resnet26_2x64d 200 96.49
shake_resnet26_2x64d 200 96.17
shake_resnet26_2x64d 200 96.25
shake_resnet26_2x64d 200 96.20
shake_resnet26_2x64d 200 95.82
shake_resnet26_2x64d 200 96.02
shake_resnet26_2x64d 200 96.00
shake_resnet26_2x64d 200 95.83
shake_resnet26_2x64d 200 95.89
shake_resnet26_2x64d 200 96.25
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_orgin' --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_c' --cutout True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_m' --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_a' --autoaugmentation True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_r' --random-erase True  --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cm'  --cutout True --mixup True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ca' --cutout True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_cr' --cutout True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ma' --mixup True --autoaugmentation True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_mr' --mixup True --random-erase True --bs 64
python train.py --model 'shake_resnet26_2x64d' --name 'ss64_ar' --autoaugmentation True --random-erase True  --bs 64

4. Reference

[1] https://github.com/BIGBALLON/CIFAR-ZOO

[2] https://github.com/pprp/MutableNAS

[3] https://github.com/clovaai/CutMix-PyTorch

[4] https://github.com/4uiiurz1/pytorch-ricap

[5] https://github.com/NUDTNASLab/pytorch-image-models

[6] https://github.com/facebookresearch/LaMCTS

[7] https://github.com/Alibaba-MIIL/ImageNet21K

Owner
PJDong
Computer vision learner, deep learner
PJDong
Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices,

Optimal Camera Position for a Practical Application of Gaze Estimation on Edge Devices, Linh Van Ma, Tin Trung Tran, Moongu Jeon, ICAIIC 2022 (The 4th

Linh 11 Oct 10, 2022
BLEURT is a metric for Natural Language Generation based on transfer learning.

BLEURT: a Transfer Learning-Based Metric for Natural Language Generation BLEURT is an evaluation metric for Natural Language Generation. It takes a pa

Google Research 492 Jan 05, 2023
Repository for open research on optimizers.

Open Optimizers Repository for open research on optimizers. This is a test in sharing research/exploration as it happens. If you use anything from thi

Ariel Ekgren 6 Jun 24, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022
Sample Prior Guided Robust Model Learning to Suppress Noisy Labels

PGDF This repo is the official implementation of our paper "Sample Prior Guided Robust Model Learning to Suppress Noisy Labels ". Citation If you use

CVSM Group - email: <a href=[email protected]"> 22 Dec 23, 2022
Python codes for Lite Audio-Visual Speech Enhancement.

Lite Audio-Visual Speech Enhancement (Interspeech 2020) Introduction This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE

Shang-Yi Chuang 85 Dec 01, 2022
A NSFW content filter.

Project_Nfilter A NSFW content filter. With a motive of minimizing the spreads and leakage of NSFW contents on internet and access to others devices ,

1 Jan 20, 2022
This is the repo of the manuscript "Dual-branch Attention-In-Attention Transformer for speech enhancement"

DB-AIAT: A Dual-branch attention-in-attention transformer for single-channel SE

Guochen Yu 68 Dec 16, 2022
RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into tables through jointly extracting intervention, outcome and outcome measure entities and their relations.

Randomised controlled trial abstract result tabulator RCT-ART is an NLP pipeline built with spaCy for converting clinical trial result sentences into

2 Sep 16, 2022
Code for the USENIX 2017 paper: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Blazing fast x86-64 VM kernel fuzzing framework with performant VM reloads for Linux, MacOS an

Chair for Sys­tems Se­cu­ri­ty 541 Nov 27, 2022
MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

Deformable 3D Convolution for Video Super-Resolution Pytorch implementation of l

Xinyi Ying 28 Dec 15, 2022
The implementation of FOLD-R++ algorithm

FOLD-R-PP The implementation of FOLD-R++ algorithm. The target of FOLD-R++ algorithm is to learn an answer set program for a classification task. Inst

13 Dec 23, 2022
Underwater industrial application yolov5m6

This project wins the intelligent algorithm contest finalist award and stands out from over 2000teams in China Underwater Robot Professional Contest, entering the final of China Underwater Robot Prof

8 Nov 09, 2022
Implementation of the HMAX model of vision in PyTorch

PyTorch implementation of HMAX PyTorch implementation of the HMAX model that closely follows that of the MATLAB implementation of The Laboratory for C

Marijn van Vliet 52 Oct 13, 2022
PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

Small Lesion Segmentation in Brain MRIs with Subpixel Embedding PyTorch implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedd

22 Oct 21, 2022
The project page of paper: Architecture disentanglement for deep neural networks [ICCV 2021, oral]

This is the project page for the paper: Architecture Disentanglement for Deep Neural Networks, Jie Hu, Liujuan Cao, Tong Tong, Ye Qixiang, ShengChuan

Jie Hu 15 Aug 30, 2022
Official PyTorch Implementation of Learning Architectures for Binary Networks

Learning Architectures for Binary Networks An Pytorch Implementation of the paper Learning Architectures for Binary Networks (BNAS) (ECCV 2020) If you

Computer Vision Lab. @ GIST 25 Jun 09, 2022
Contrastive unpaired image-to-image translation, faster and lighter training than cyclegan (ECCV 2020, in PyTorch)

Contrastive Unpaired Translation (CUT) video (1m) | video (10m) | website | paper We provide our PyTorch implementation of unpaired image-to-image tra

1.7k Dec 27, 2022
Rest API Written In Python To Classify NSFW Images.

Rest API Written In Python To Classify NSFW Images.

Wahyusaputra 2 Dec 23, 2021