ML-Decoder: Scalable and Versatile Classification Head

Last update: Jan 04, 2023

Related tags

Deep Learning ML_Decoder

Overview

ML-Decoder: Scalable and Versatile Classification Head

Paper

Official PyTorch Implementation

Tal Ridnik, Gilad Sharir, Avi Ben-Cohen, Emanuel Ben-Baruch, Asaf Noy
DAMO Academy, Alibaba Group

Abstract

In this paper, we introduce ML-Decoder, a new attention-based classification head. ML-Decoder predicts the existence of class labels via queries, and enables better utilization of spatial data compared to global average pooling. By redesigning the decoder architecture, and using a novel group-decoding scheme, ML-Decoder is highly efficient, and can scale well to thousands of classes. Compared to using a larger backbone, ML-Decoder consistently provides a better speed-accuracy trade-off. ML-Decoder is also versatile - it can be used as a drop-in replacement for various classification heads, and generalize to unseen classes when operated with word queries. Novel query augmentations further improve its generalization ability. Using ML-Decoder, we achieve state-of-the-art results on several classification tasks: on MS-COCO multi-label, we reach 91.4% mAP; on NUS-WIDE zero-shot, we reach 31.1% ZSL mAP; and on ImageNet single-label, we reach with vanilla ResNet50 backbone a new top score of 80.7%, without extra data or distillation.

ML-Decoder Implementation

ML-Decoder implementation is available here. It can be easily integrated into any backbone using this example code:

ml_decoder_head = MLDecoder(num_classes) # initilization

spatial_embeddings = self.backbone(input_image) # backbone generates spatial embeddings      
 
logits = ml_decoder_head(spatial_embeddings) # transfrom spatial embeddings to logits

Training Code

We will share a full reproduction code for the article results.

Multi-label Training Code

A reproduction code for MS-COCO multi-label:

python train.py  \
--data=/home/datasets/coco2014/ \
--model_name=tresnet_l \
--image_size=448

Single-label Training Code

Our single-label training code uses the excellent timm repo. Reproduction code is currently from a fork, we will work toward a full merge to the main repo.

git clone https://github.com/mrT23/pytorch-image-models.git

This is the code for A2 configuration training, with ML-Decoder (--use-ml-decoder-head=1):

python -u -m torch.distributed.launch --nproc_per_node=8 \
--nnodes=1 \
--node_rank=0 \
./train.py \
/data/imagenet/ \
--amp \
-b=256 \
--epochs=300 \
--drop-path=0.05 \
--opt=lamb \
--weight-decay=0.02 \
--sched='cosine' \
--lr=4e-3 \
--warmup-epochs=5 \
--model=resnet50 \
--aa=rand-m7-mstd0.5-inc1 \
--reprob=0.0 \
--remode='pixel' \
--mixup=0.1 \
--cutmix=1.0 \
--aug-repeats 3 \
--bce-target-thresh 0.2 \
--smoothing=0 \
--bce-loss \
--train-interpolation=bicubic \
--use-ml-decoder-head=1

ZSL Training Code

Reproduction code for ZSL is WIP.

Citation

@misc{ridnik2021mldecoder,
      title={ML-Decoder: Scalable and Versatile Classification Head}, 
      author={Tal Ridnik and Gilad Sharir and Avi Ben-Cohen and Emanuel Ben-Baruch and Asaf Noy},
      year={2021},
      eprint={2111.12933},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ML-Decoder: Scalable and Versatile Classification Head

Related tags

Overview

ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder Implementation

Training Code

Multi-label Training Code

Single-label Training Code

ZSL Training Code

Citation

Owner

yolov5 deepsort 行人车辆跟踪检测计数

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

Object detection evaluation metrics using Python.

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

ArcaneGAN by Alex Spirin

Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

Malmo Collaborative AI Challenge - Team Pig Catcher

A treasure chest for visual recognition powered by PaddlePaddle

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

DEMix Layers for Modular Language Modeling

PyTorch implementation of DeepDream algorithm

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

A lightweight python AUTOmatic-arRAY library.

Official implementation of "Robust channel-wise illumination estimation"

This is a TensorFlow implementation for C2-Rec

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Few-shot Learning of GPT-3

ML-Decoder: Scalable and Versatile Classification Head

Related tags

Overview

ML-Decoder: Scalable and Versatile Classification Head

ML-Decoder Implementation

Training Code

Multi-label Training Code

Single-label Training Code

ZSL Training Code

Citation

Owner

yolov5 deepsort 行人 车辆 跟踪 检测 计数

Adversarial Texture Optimization from RGB-D Scans (CVPR 2020).

Object detection evaluation metrics using Python.

Awesome-AI-books - Some awesome AI related books and pdfs for learning and downloading

Python wrappers to the C++ library SymEngine, a fast C++ symbolic manipulation library.

PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).

ArcaneGAN by Alex Spirin

Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

Malmo Collaborative AI Challenge - Team Pig Catcher

A treasure chest for visual recognition powered by PaddlePaddle

Out-of-Town Recommendation with Travel Intention Modeling (AAAI2021)

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

DEMix Layers for Modular Language Modeling

PyTorch implementation of DeepDream algorithm

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

A lightweight python AUTOmatic-arRAY library.

Official implementation of "Robust channel-wise illumination estimation"

This is a TensorFlow implementation for C2-Rec

Code accompanying "Dynamic Neural Relational Inference" from CVPR 2020

Few-shot Learning of GPT-3

yolov5 deepsort 行人车辆跟踪检测计数