"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Last update: Dec 06, 2022

Overview

FGVC8

Exploring Vision Transformers for Fine-grained Classification paper presented at the CVPR 2021, The Eight Workshop on Fine-Grained Visual Categorization on June 25th.

Abstract

Existing computer vision research in categorization struggles with fine-grained attributes recognition due to the inherently high intra-class variances and low inter-class variances. SOTA methods tackle this challenge by locating the most informative image regions and rely on them to classify the complete image. The most recent work, Vision Transformer (ViT), shows its strong performance in both traditional and fine-grained classification tasks.

In this work, we propose a multi-stage ViT framework for fine-grained image classification tasks, which localizes the informative image regions without requiring architectural changes using the inherent multi-head self-attention mechanism. We also introduce attention-guided augmentations for improving the model's capabilities.

We demonstrate the value of our approach by experimenting with four popular fine-grained benchmarks: CUB-200-2011, Stanford Cars, Stanford Dogs, and FGVC7 Plant Pathology. We also prove our model's interpretability via qualitative results.

Instructions

Upcoming

Citation

If you find interesting our results, or you use or code/ideas please consider to cite our work:

@misc{conde2021exploring,
      title={Exploring Vision Transformers for Fine-grained Classification}, 
      author={Marcos V. Conde and Kerem Turgutlu},
      year={2021},
      eprint={2106.10587},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

"Exploring Vision Transformers for Fine-grained Classification" at CVPRW FGVC8

Related tags

Overview

FGVC8

Abstract

Instructions

Citation

References

Owner

Marcos V. Conde

Tool which allow you to detect and translate text.

Multi-task yolov5 with detection and segmentation based on yolov5

A library for uncertainty representation and training in neural networks.

a Pytorch easy re-implement of "YOLOX: Exceeding YOLO Series in 2021"

Official PyTorch implementation of "ArtFlow: Unbiased Image Style Transfer via Reversible Neural Flows"

Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

Vertex AI: Serverless framework for MLOPs (ESP / ENG)

Negative Sample is Negative in Its Own Way: Tailoring Negative Sentences forImage-Text Retrieval

All supplementary material used by me while TA-ing CS3244: Machine Learning

Generative Exploration and Exploitation - This is an improved version of GENE.

Face Depixelizer based on "PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models" repository.

All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Pytorch implementation of VAEs for heterogeneous likelihoods.

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

SBINN: Systems-biology informed neural network

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

GLNet for Memory-Efficient Segmentation of Ultra-High Resolution Images