This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper

Vision-Attention-Papers

🔥 (citations > 200)

TODO : Code about different attention mechanisms will come soon.
TODO : Code link will come soon.
TODO : collect more related papers. Contributions are welcome.

Channel attention

Squeeze-and-Excitation Networks(CVPR2018) pdf, (PAMI2019 version) pdf 🔥
Image superresolution using very deep residual channel attention networks(ECCV2018) pdf 🔥
Context encoding for semantic segmentation(CVPR2018) pdf 🔥
Spatio-temporal channel correlation networks for action classification(ECCV2018) pdf
Global second-order pooling convolutional networks(CVPR2019) pdf
Srm : A style-based recalibration module for convolutional neural networks(ICCV2019) pdf
You look twice: Gaternet for dynamic filter selection in cnns(CVPR2019) pdf
Second-order attention network for single image super-resolution(CVPR2019) pdf 🔥
Spsequencenet: Semantic segmentation network on 4d point clouds(CVPR2020) pdf
Ecanet: Efficient channel attention for deep convolutional neural networks (CVPR2020) pdf 🔥
Gated channel transformation for visual recognition(CVPR2020) pdf
Fcanet: Frequency channel attention networks(ICCV2021) pdf

Spatial attention

Recurrent models of visual attention(NeurIPS2014), pdf 🔥
Show, attend and tell: Neural image caption generation with visual attention(PMLR2015) pdf 🔥
Draw: A recurrent neural network for image generation(ICML2015) pdf 🔥
Spatial transformer networks(NeurIPS2015) pdf 🔥
Multiple object recognition with visual attention(ICLR2015) pdf 🔥
Action recognition using visual attention(arXiv2015) pdf 🔥
Videolstm convolves, attends and flows for action recognition(arXiv2016) pdf 🔥
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition(CVPR2017) pdf 🔥
Learning multi-attention convolutional neural network for fine-grained image recognition(ICCV2017) pdf 🔥
Diversified visual attention networks for fine-grained object classification(TMM2017) pdf 🔥
Attentional pooling for action recognition(NeurIPS2017) pdf 🔥
Non-local neural networks(CVPR2018) pdf 🔥
Attentional shapecontextnet for point cloud recognition(CVPR2018) pdf
Relation networks for object detection(CVPR2018) pdf 🔥
a2-nets: Double attention networks(NeurIPS2018) pdf 🔥
Attention-aware compositional network for person re-identification(CVPR2018) pdf 🔥
Tell me where to look: Guided attention inference network(CVPR2018) pdf 🔥
Pedestrian alignment network for large-scale person re-identification(TCSVT2018) pdf 🔥
Learn to pay attention(ICLR2018) pdf 🔥
Attention U-Net: Learning Where to Look for the Pancreas(MIDL2018) pdf 🔥
Psanet: Point-wise spatial attention network for scene parsing(ECCV2018) pdf 🔥
Self attention generative adversarial networks(ICML2019) pdf 🔥
Attentional pointnet for 3d-object detection in point clouds(CVPRW2019) pdf
Co-occurrent features in semantic segmentation(CVPR2019) pdf
Attention augmented convolutional networks(ICCV2019) pdf 🔥
Local relation networks for image recognition(ICCV2019) pdf
Latentgnn: Learning efficient nonlocal relations for visual recognition(ICML2019) pdf
Graph-based global reasoning networks(CVPR2019) pdf 🔥
Gcnet: Non-local networks meet squeeze-excitation networks and beyond(ICCVW2019) pdf 🔥
Asymmetric non-local neural networks for semantic segmentation(ICCV2019) pdf 🔥
Looking for the devil in the details: Learning trilinear attention sampling network for fine-grained image recognition(CVPR2019) pdf
Second-order non-local attention networks for person re-identification(ICCV2019) pdf 🔥
End-to-end comparative attention networks for person re-identification(ICCV2019) pdf 🔥
Modeling point clouds with self-attention and gumbel subset sampling(CVPR2019) pdf
Diagnose like a radiologist: Attention guided convolutional neural network for thorax disease classification(arXiv 2019) pdf
L2g autoencoder: Understanding point clouds by local-to-global reconstruction with hierarchical self-attention(arXiv 2019) pdf
Generative pretraining from pixels(PMLR2020) pdf
Exploring self-attention for image recognition(CVPR2020) pdf
Cf-sis: Semantic-instance segmentation of 3d point clouds by context fusion with self attention(MM20) pdf
Disentangled non-local neural networks(ECCV2020) pdf
Relation-aware global attention for person re-identification(CVPR2020) pdf
Segmentation transformer: Object-contextual representations for semantic segmentation(ECCV2020) pdf 🔥
Spatial pyramid based graph reasoning for semantic segmentation(CVPR2020) pdf
Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation(CVPR2020) pdf
End-to-end object detection with transformers(ECCV2020) pdf 🔥
Pointasnl: Robust point clouds processing using nonlocal neural networks with adaptive sampling(CVPR2020) pdf
Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers(CVPR2021) pdf
An image is worth 16x16 words: Transformers for image recognition at scale(ICLR2021) pdf 🔥
An empirical study of training selfsupervised vision transformers(CVPR2021) pdf
Ocnet: Object context network for scene parsing(IJCV 2021) pdf 🔥
Point transformer(ICCV 2021) pdf
PCT: Point Cloud Transformer (CVMJ 2021) pdf
Pre-trained image processing transformer(CVPR 2021) pdf
An empirical study of training self-supervised vision transformers(ICCV 2021) pdf
Segformer: Simple and efficient design for semantic segmentation with transformers(arxiv 2021) pdf
Beit: Bert pre-training of image transformers(arxiv 2021) pdf
Beyond selfattention: External attention using two linear layers for visual tasks(arxiv 2021) pdf
Query2label: A simple transformer way to multi-label classification(arxiv 2021) pdf
Transformer in transformer(arxiv 2021) pdf

Temporal attention

Jointly attentive spatial-temporal pooling networks for video-based person re-identification (ICCV 2017) pdf 🔥
Video person reidentification with competitive snippet-similarity aggregation and co-attentive snippet embedding(CVPR 2018) pdf
Scan: Self-and-collaborative attention network for video person re-identification (TIP 2019) pdf

Branch attention

Training very deep networks, (NeurIPS 2015) pdf 🔥
Selective kernel networks,(CVPR 2019) pdf 🔥
CondConv: Conditionally Parameterized Convolutions for Efficient Inference (NeurIPS 2019) pdf
Dynamic convolution: Attention over convolution kernels (CVPR 2020) pdf
ResNest: Split-attention networks (arXiv 2020) pdf 🔥

ChannelSpatial attention

Residual attention network for image classification (CVPR 2017) pdf 🔥
SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning,(CVPR 2017) pdf 🔥
CBAM: convolutional block attention module, (ECCV 2018) pdf 🔥
Harmonious attention network for person re-identification (CVPR 2018) pdf 🔥
Recalibrating fully convolutional networks with spatial and channel “squeeze and excitation” blocks (TMI 2018) pdf
Mancs: A multi-task attentional network with curriculum sampling for person re-identification (ECCV 2018) pdf 🔥
Bam: Bottleneck attention module(BMVC 2018) pdf 🔥
Pvnet: A joint convolutional network of point cloud and multi-view for 3d shape recognition (ACM MM 2018) pdf
Learning what and where to attend,(ICLR 2019) pdf
Dual attention network for scene segmentation (CVPR 2019) pdf 🔥
Abd-net: Attentive but diverse person re-identification (ICCV 2019) pdf
Mixed high-order attention network for person re-identification (ICCV 2019) pdf
Mlcvnet: Multi-level context votenet for 3d object detection (CVPR 2020) pdf
Improving convolutional networks with self-calibrated convolutions (CVPR 2020) pdf
Relation-aware global attention for person re-identification (CVPR 2020) pdf
Strip Pooling: Rethinking spatial pooling for scene parsing (CVPR 2020) pdf
Rotate to attend: Convolutional triplet attention module, (WACV 2021) pdf
Coordinate attention for efficient mobile network design (CVPR 2021) pdf
Simam: A simple, parameter-free attention module for convolutional neural networks (ICML 2021) pdf

SpatialTemporal attention

An end-to-end spatio-temporal attention model for human action recognition from skeleton data(AAAI 2017) pdf 🔥
Diversity regularized spatiotemporal attention for video-based person re-identification (ArXiv 2018) 🔥
Interpretable spatio-temporal attention for video action recognition (ICCVW 2019) pdf
Hierarchical lstms with adaptive attention for visual captioning, (TPAMI 2020) pdf
Stat: Spatial-temporal attention mechanism for video captioning, (TMM 2020) pdf_link
Gta: Global temporal attention for video action understanding (ArXiv 2020) pdf
Multi-granularity reference-aided attentive feature aggregation for video-based person re-identification (CVPR 2020) pdf
Read: Reciprocal attention discriminator for image-to-video re-identification, (ECCV 2020) pdf
Decoupled spatial-temporal transformer for video inpainting (ArXiv 2021) pdf

Summary of related papers on visual attention

Related tags

Overview

This repo is built for paper: Attention Mechanisms in Computer Vision: A Survey paper

Channel attention

Spatial attention

Temporal attention

Branch attention

ChannelSpatial attention

SpatialTemporal attention

Owner

MenghaoGuo

Semantic Segmentation in Pytorch

High-Fidelity Pluralistic Image Completion with Transformers (ICCV 2021)

3D dataset of humans Manipulating Objects in-the-Wild (MOW)

This repository stores the code to reproduce the results published in "TiWS-iForest: Isolation Forest in Weakly Supervised and Tiny ML scenarios"

Spatial Sparse Convolution Library

Checkout some cool self-projects you can try your hands on to curb your boredom this December!

[ ICCV 2021 Oral ] Our method can estimate camera poses and neural radiance fields jointly when the cameras are initialized at random poses in complex scenarios (outside-in scenes, even with less texture or intense noise )

Simple tool to combine(merge) onnx models. Simple Network Combine Tool for ONNX.

Code implementation of Data Efficient Stagewise Knowledge Distillation paper.

How to train a CNN to 99% accuracy on MNIST in less than a second on a laptop

STARCH compuets regional extreme storm physical characteristics and moisture balance based on spatiotemporal precipitation data from reanalysis or climate model data.

Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

This is the official PyTorch implementation of the CVPR 2020 paper "TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting".

SLAMP: Stochastic Latent Appearance and Motion Prediction

Pynomial - a lightweight python library for implementing the many confidence intervals for the risk parameter of a binomial model

A Deep Learning based project for creating line art portraits.

EEGEyeNet is benchmark to evaluate ET prediction based on EEG measurements with an increasing level of difficulty

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification