PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Last update: Jan 06, 2023

Related tags

Overview

Background Activation Suppression for Weakly Supervised Object Localization

PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''. This repository contains PyTorch training code, inference code and pretrained models.

📎 Paper Link

Background Activation Suppression for Weakly Supervised Object Localization (link)

Authors: Pingyu Wu*, Wei Zhai*, Yang Cao
Institution: University of Science and Technology of China (USTC)

💡 Abstract

Weakly supervised object localization (WSOL) aims to localize the object region using only image-level labels as supervision. Recently a new paradigm has emerged by generating a foreground prediction map (FPM) to achieve the localization task. Existing FPM-based methods use cross-entropy (CE) to evaluate the foreground prediction map and to guide the learning of generator. We argue for using activation value to achieve more efficient learning. It is based on the experimental observation that, for a trained network, CE converges to zero when the foreground mask covers only part of the object region. While activation value increases until the mask expands to the object boundary, which indicates that more object areas can be learned by using activation value. In this paper, we propose a Background Activation Suppression (BAS) method. Specifically, an Activation Map Constraint module (AMC) is designed to facilitate the learning of generator by suppressing the background activation values. Meanwhile, by using the foreground region guidance and the area constraint, BAS can learn the whole region of the object. Furthermore, in the inference phase, we consider the prediction maps of different categories together to obtain the final localization results. Extensive experiments show that BAS achieves significant and consistent improvement over the baseline methods on the CUB-200-2011 and ILSVRC datasets.

✨ Motivation

Motivation. (A) The entroy value of CE loss $w.r.t$ foreground mask and foreground activation value $w.r.t$ foreground mask. To illustrate the generality of this phenomenon, more examples are shown in the subfigure on the right. (B) Experimental procedure and related definitions. Implementation details of the experiment and further results are available in the Supplementary Material.

Exploratory Experiment

We introduce the implementation of the experiment, as shown in Fig. \ref{Exploratory Experiment} (A). For a given GT binary mask, the activation value (Activation) and cross-entropy (Entropy) corresponding to this mask are generated by masking the feature map. We erode and dilate the ground-truth mask with a convolution of kernel size $5n \times 5n$, obtain foreground masks with different area sizes by changing the value of $n$, and plot the activation value versus cross-entropy with the area as the horizontal axis, as shown in Fig. \ref{Exploratory Experiment} (B). By inverting the foreground mask, the corresponding background activation values for the foreground mask area are generated in the same way. In Fig. \ref{Exploratory Experiment} (C), we show the curves of entropy, foreground activation, and background activation with mask area. It can be noticed that both background activation and foreground activation values have a higher correlation with the mask compared to the entropy. We show more examples in the Supplementary Material.

Exploratory Experiment. Examples about the entroy value of CE loss $w.r.t$ foreground mask and foreground activation value $w.r.t$ foreground mask.

📖 Method

The architecture of the proposed BAS. In the training phase, the class-specific foreground prediction map $F^{fg}$ and the coupled background prediction map $F^{bg}$ are obtained by the generator, and then fed into the activation map constraint module together with the feature map $F$. In the inference phase, we utilize Top-k to generate the final localization map.

📃 Requirements

python 3.6.10
torch 1.4.0
torchvision 0.5.0
opencv 4.5.3

✏️ Usage

Start

git clone https://github.com/wpy1999/BAS.git
cd BAS

Download Datasets

Training

We will release our training code upon acceptance.

Inference

To test the CUB models, you can download the trained models from [ Google Drive (VGG16) ], [ Google Drive (Mobilenetv1) ], [ Google Drive (ResNet50) ], [ Google Drive (Inceptionv3) ], then run BAS_inference.py:

cd CUB
python BAS_inference.py --arch vgg

To test the ILSVRC models, you can download the trained models from [ Google Drive (VGG16) ], [ Google Drive (Mobilenetv1) ], [ Google Drive (ResNet50) ], [ Google Drive (Inceptionv3) ], then run BAS_inference.py:

cd ILSVRC
python BAS_inference.py --arch vgg

📊 Experimental Results

✉️ Statement

This project is for research purpose only, please contact us for the licence of commercial use. For any other questions please contact [email protected] or [email protected].

🔍 Citation

@inproceedings{BAS,
  title={Background Activation Suppression for Weakly Supervised Object Localization},
  author={Pingyu Wu and Wei Zhai and Yang Cao},
  booktitle={xxx},
  year={2021}
}

PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Related tags

Overview

Background Activation Suppression for Weakly Supervised Object Localization

📋 Table of content

📎 Paper Link

💡 Abstract

✨ Motivation

Exploratory Experiment

📖 Method

📃 Requirements

✏️ Usage

Start

Download Datasets

Training

Inference

📊 Experimental Results

✉️ Statement

🔍 Citation

Owner

A method to perform unsupervised cross-region adaptation of crop classifiers trained with satellite image time series.

HNECV: Heterogeneous Network Embedding via Cloud model and Variational inference

Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

ChainerRL is a deep reinforcement learning library built on top of Chainer.

VideoGPT: Video Generation using VQ-VAE and Transformers

Image segmentation with private İstanbul Dataset

Official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model.

simple demo codes for Learning to Teach with Dynamic Loss Functions

Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

"Graph Neural Controlled Differential Equations for Traffic Forecasting", AAAI 2022

Code, pre-trained models and saliency results for the paper "Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images".

Contrastive Language-Image Pretraining

[CVPR 2022] PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision (Oral)

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

ReConsider is a re-ranking model that re-ranks the top-K (passage, answer-span) predictions of an Open-Domain QA Model like DPR (Karpukhin et al., 2020).

[AAAI 2021] MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Extremely easy multi instancing software for minecraft speedrunning.

ULMFiT for Genomic Sequence Data