source code of “Visual Saliency Transformer” (ICCV2021)

Last update: Dec 21, 2022

Related tags

Overview

Visual Saliency Transformer (VST)

source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, and Ling Shao.

created by Ni Zhang, email: [email protected]

Requirement

Pytorch 1.6.0
Torchvison 0.7.0

RGB VST for RGB Salient Object Detection

Data Preparation

Training Set

We use the training set of DUTS to train our VST for RGB SOD. Besides, we follow Egnet to generate contour maps of DUTS trainset for training. You can directly download the generated contour maps DUTS-TR-Contour from [baidu pan fetch code: ow76 | Google drive] and put it into RGB_VST/Data folder.

Testing Set

We use the testing set of DUTS, ECSSD, HKU-IS, PASCAL-S, DUT-O, and SOD to test our VST. After Downloading, put them into RGB_VST/Data folder.

Your RGB_VST/Data folder should look like this:

-- Data
   |-- DUTS
   |   |-- DUTS-TR
   |   |-- | DUTS-TR-Image
   |   |-- | DUTS-TR-Mask
   |   |-- | DUTS-TR-Contour
   |   |-- DUTS-TE
   |   |-- | DUTS-TE-Image
   |   |-- | DUTS-TE-Mask
   |-- ECSSD
   |   |--images
   |   |--GT
   ...

Training, Testing, and Evaluation

cd RGB_VST
Download the pretrained T2T-ViT_t-14 model [baidu pan fetch code: 2u34 | Google drive] and put it into pretrained_model/ folder.
Run python train_test_eval.py --Training True --Testing True --Evaluation True for training, testing, and evaluation. The predictions will be in preds/ folder and the evaluation results will be in result.txt file.

Testing on Our Pretrained RGB VST Model

cd RGB_VST
Download our pretrained RGB_VST.pth[baidu pan fetch code: pe54 | Google drive] and then put it in checkpoint/ folder.
Run python train_test_eval.py --Testing True --Evaluation True for testing and evaluation. The predictions will be in preds/ folder and the evaluation results will be in result.txt file.

Our saliency maps can be downloaded from [baidu pan fetch code: 92t0 | Google drive].

SOTA Saliency Maps for Comparison

The saliency maps of the state-of-the-art methods in our paper can be downloaded from [baidu pan fetch code: de4k | Google drive].

RGB-D VST for RGB-D Salient Object Detection

Data Preparation

Training Set

We use 1,485 images from NJUD, 700 images from NLPR, and 800 images from DUTLF-Depth to train our VST for RGB-D SOD. Besides, we follow Egnet to generate corresponding contour maps for training. You can directly download the whole training set from here [baidu pan fetch code: 7vsw | Google drive] and put it into RGBD_VST/Data folder.

Testing Set

NJUD [baidu pan fetch code: 7mrn | Google drive]
NLPR [baidu pan fetch code: tqqm | Google drive]
DUTLF-Depth [baidu pan fetch code: 9jac | Google drive]
STERE [baidu pan fetch code: 93hl | Google drive]
LFSD [baidu pan fetch code: l2g4 | Google drive]
RGBD135 [baidu pan fetch code: apzb | Google drive]
SSD [baidu pan fetch code: j3v0 | Google drive]
SIP [baidu pan fetch code: q0j5 | Google drive]
ReDWeb-S

After Downloading, put them into RGBD_VST/Data folder.

Your RGBD_VST/Data folder should look like this:

-- Data
   |-- NJUD
   |   |-- trainset
   |   |-- | RGB
   |   |-- | depth
   |   |-- | GT
   |   |-- | contour
   |   |-- testset
   |   |-- | RGB
   |   |-- | depth
   |   |-- | GT
   |-- STERE
   |   |-- RGB
   |   |-- depth
   |   |-- GT
   ...

Training, Testing, and Evaluation

cd RGBD_VST
Download the pretrained T2T-ViT_t-14 model [baidu pan fetch code: 2u34 | Google drive] and put it into pretrained_model/ folder.
Run python train_test_eval.py --Training True --Testing True --Evaluation True for training, testing, and evaluation. The predictions will be in preds/ folder and the evaluation results will be in result.txt file.

Testing on Our Pretrained RGB-D VST Model

cd RGBD_VST
Download our pretrained RGBD_VST.pth[baidu pan fetch code: zt0v | Google drive] and then put it in checkpoint/ folder.
Run python train_test_eval.py --Testing True --Evaluation True for testing and evaluation. The predictions will be in preds/ folder and the evaluation results will be in result.txt file.

Our saliency maps can be downloaded from [baidu pan fetch code: jovk | Google drive].

SOTA Saliency Maps for Comparison

The saliency maps of the state-of-the-art methods in our paper can be downloaded from [baidu pan fetch code: i1we | Google drive].

Acknowledgement

We thank the authors of Egnet for providing codes of generating contour maps. We also thank Zhao Zhang for providing the efficient evaluation tool.

Citation

If you think our work is helpful, please cite

@inproceedings{liu2021VST, 
  title={Visual Saliency Transformer}, 
  author={Liu, Nian and Zhang, Ni and Han, Junwei and Shao, Ling},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2021}
}

source code of “Visual Saliency Transformer” (ICCV2021)

Related tags

Overview

Visual Saliency Transformer (VST)

Requirement

RGB VST for RGB Salient Object Detection

Data Preparation

Training Set

Testing Set

Training, Testing, and Evaluation

Testing on Our Pretrained RGB VST Model

SOTA Saliency Maps for Comparison

RGB-D VST for RGB-D Salient Object Detection

Data Preparation

Training Set

Testing Set

Training, Testing, and Evaluation

Testing on Our Pretrained RGB-D VST Model

SOTA Saliency Maps for Comparison

Acknowledgement

Citation

Owner

[arXiv'22] Panoptic NeRF: 3D-to-2D Label Transfer for Panoptic Urban Scene Segmentation

City-Scale Multi-Camera Vehicle Tracking Guided by Crossroad Zones Code

[NeurIPS-2020] Self-paced Contrastive Learning with Hybrid Memory for Domain Adaptive Object Re-ID.

Learn about Spice.ai with in-depth samples

FaceVerse: a Fine-grained and Detail-controllable 3D Face Morphable Model from a Hybrid Dataset (CVPR2022)

Re-implement CycleGAN in Tensorlayer

SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

Unity Propagation in Bayesian Networks Handling Inconsistency via Unity Smoothing

Adversarial-Information-Bottleneck - Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck (NeurIPS21)

Airbus Ship Detection Challenge

Spatial Temporal Graph Convolutional Networks (ST-GCN) for Skeleton-Based Action Recognition in PyTorch

BED: A Real-Time Object Detection System for Edge Devices

Detecting Human-Object Interactions with Object-Guided Cross-Modal Calibrated Semantics

PyTorch implementation for NED. It can be used to manipulate the facial emotions of actors in videos based on emotion labels or reference styles.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

bio_inspired_min_nets_improve_the_performance_and_robustness_of_deep_networks

This is a repository for a Semantic Segmentation inference API using the Gluoncv CV toolkit

GoodNews Everyone! Context driven entity aware captioning for news images

A foreign language learning aid using a neural network to predict probability of translating foreign words

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"