[ICCV 2021 Oral] Deep Evidential Action Recognition

Overview

DEAR (Deep Evidential Action Recognition)

Project | Paper & Supp

Wentao Bao, Qi Yu, Yu Kong

International Conference on Computer Vision (ICCV Oral), 2021.

Table of Contents

  1. Introduction
  2. Installation
  3. Datasets
  4. Testing
  5. Training
  6. Model Zoo
  7. Citation

Introduction

We propose the Deep Evidential Action Recognition (DEAR) method to recognize actions in an open world. Specifically, we formulate the action recognition problem from the evidential deep learning (EDL) perspective and propose a novel model calibration method to regularize the EDL training. Besides, to mitigate the static bias of video representation, we propose a plug-and-play module to debias the learned representation through contrastive learning. Our DEAR model trained on UCF-101 dataset achieves significant and consistent performance gains based on multiple action recognition models, i.e., I3D, TSM, SlowFast, TPN, with HMDB-51 or MiT-v2 dataset as the unknown.

Demo

The following figures show the inference results by the SlowFast + DEAR model trained on UCF-101 dataset.

UCF-101
(Known)

1 2 3 4

HMDB-51
(Unknown)

6 7 8 10

Installation

This repo is developed from MMAction2 codebase. Since MMAction2 is updated in a fast pace, most of the requirements and installation steps are similar to the version MMAction2 v0.9.0.

Requirements and Dependencies

Here we only list our used requirements and dependencies. It would be great if you can work around with the latest versions of the listed softwares and hardwares on the latest MMAction2 codebase.

  • Linux: Ubuntu 18.04 LTS
  • GPU: GeForce RTX 3090, A100-SXM4
  • CUDA: 11.0
  • GCC: 7.5
  • Python: 3.7.9
  • Anaconda: 4.9.2
  • PyTorch: 1.7.1+cu110
  • TorchVision: 0.8.2+cu110
  • OpenCV: 4.4.0
  • MMCV: 1.2.1
  • MMAction2: 0.9.0

Installation Steps

The following steps are modified from MMAction2 (v0.9.0) installation document. If you encountered problems, you may refer to more details in the official document, or raise an issue in this repo.

a. Create a conda virtual environment of this repo, and activate it:

conda create -n mmaction python=3.7 -y
conda activate mmaction

b. Install PyTorch and TorchVision following the official instructions, e.g.,

conda install pytorch=1.7.1 cudatoolkit=11.0 torchvision=0.8.2 -c pytorch

c. Install mmcv, we recommend you to install the pre-build mmcv as below.

pip install mmcv-full==1.2.1 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.1/index.html

Important: If you have already installed mmcv and try to install mmcv-full, you have to uninstall mmcv first by running pip uninstall mmcv. Otherwise, there will be ModuleNotFoundError.

d. Clone the source code of this repo:

git clone https://github.com/Cogito2012/DEAR.git mmaction2
cd mmaction2

e. Install build requirements and then install DEAR.

pip install -r requirements/build.txt
pip install -v -e .  # or "python setup.py develop"

If no error appears in your installation steps, then you are all set!

Datasets

This repo uses standard video action datasets, i.e., UCF-101 for closed set training, and HMDB-51 and MiT-v2 test sets as two different unknowns. Please refer to the default MMAction2 dataset setup steps to setup these three datasets correctly.

Note: You can just ignore the Step 3. Extract RGB and Flow in the referred setup steps since all codes related to our paper do not rely on extracted frames and optical flow. This will save you large amount of disk space!

Testing

To test our pre-trained models (see the Model Zoo), you need to download a model file and unzip it under work_dir. Let's take the I3D-based DEAR model as an example. First, download the pre-trained I3D-based models, where the full DEAR model is saved in the folder finetune_ucf101_i3d_edlnokl_avuc_debias. The following directory tree is for your reference to place the downloaded files.

work_dirs    
├── i3d
│    ├── finetune_ucf101_i3d_bnn
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_dnn
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl_avuc_ced
│    │   └── latest.pth
│    ├── finetune_ucf101_i3d_edlnokl_avuc_debias
│    │   └── latest.pth
│    └── finetune_ucf101_i3d_rpl
│        └── latest.pth
├── slowfast
├── tpn_slowonly
└── tsm

a. Closed Set Evaluation.

Top-K accuracy and mean class accuracy will be reported.

cd experiments/i3d
bash evaluate_i3d_edlnokl_avuc_debias_ucf101.sh

b. Get Uncertainty Threshold.

The threshold value of one model will be reported.

cd experiments/i3d
# run the thresholding with BATCH_SIZE=2 on GPU_ID=0
bash run_get_threshold.sh 0 edlnokl_avuc_debias 2

c. Open Set Evaluation and Comparison.

The open set evaluation metrics and openness curves will be reported.

Note: Make sure the threshold values of different models are from the reported results in step b.

cd experiments/i3d
bash run_openness.sh HMDB  # use HMDB-51 test set as the Unknown
bash run_openness.sh MiT  # use MiT-v2 test set as the Unknown

d. Out-of-Distribution Detection.

The uncertainty distribution figure of a specified model will be reported.

cd experiments/i3d
bash run_ood_detection.sh 0 HMDB edlnokl_avuc_debias

e. Draw Open Set Confusion Matrix

The confusion matrix with unknown dataset used will be reported.

cd experiments/i3d
bash run_draw_confmat.sh HMDB  # or MiT

Training

Let's still take the I3D-based DEAR model as an example.

cd experiments/i3d
bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0

Since model training is time consuming, we strongly recommend you to run the above training script in a backend way if you are using SSH remote connection.

nohup bash finetune_i3d_edlnokl_avuc_debias_ucf101.sh 0 >train.log 2>&1 &
# monitoring the training status whenever you open a new terminal
tail -f train.log

Visualizing the training curves (losses, accuracies, etc.) on TensorBoard:

cd work_dirs/i3d/finetune_ucf101_i3d_edlnokl_avuc_debias/tf_logs
tensorboard --logdir=./ --port 6008

Then, you will see the generated url address http://localhost:6008. Open this address with your Internet Browser (such as Chrome), you will monitoring the status of training.

If you are using SSH connection to a remote server without monitor, tensorboard visualization can be done on your local machine by manually mapping the SSH port number:

ssh -L 16008:localhost:6008 {your_remote_name}@{your_remote_ip}

Then, you can monitor the tensorboard by the port number 16008 by typing http://localhost:16008 in your browser.

Model Zoo

The pre-trained weights (checkpoints) are available below.

Model Checkpoint Train Config Test Config Open maF1 (%) Open Set AUC (%) Closed Set ACC (%)
I3D + DEAR ckpt train test 77.24 / 69.98 77.08 / 81.54 93.89
TSM + DEAR ckpt train test 84.69 / 70.15 78.65 / 83.92 94.48
TPN + DEAR ckpt train test 81.79 / 71.18 79.23 / 81.80 96.30
SlowFast + DEAR ckpt train test 85.48 / 77.28 82.94 / 86.99 96.48

For other checkpoints of the compared baseline models, please download them in the Google Drive.

Citation

If you find the code useful in your research, please cite:

@inproceedings{BaoICCV2021DEAR,
  author = "Bao, Wentao and Yu, Qi and Kong, Yu",
  title = "Evidential Deep Learning for Open Set Action Recognition",
  booktitle = "International Conference on Computer Vision (ICCV)",
  year = "2021"
}

License

See Apache-2.0 License

Acknowledgement

In addition to the MMAction2 codebase, this repo contains modified codes from:

We sincerely thank the owners of all these great repos!

Owner
Wentao Bao
Ph.D. Student
Wentao Bao
The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

WSRGlow The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio sa

Kexun Zhang 96 Jan 03, 2023
MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

旷视天元 MegEngine 77 Nov 22, 2022
Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Surface Form Competition This is the official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right" We p

Peter West 46 Dec 23, 2022
Official implementation for paper Render In-between: Motion Guided Video Synthesis for Action Interpolation

Render In-between: Motion Guided Video Synthesis for Action Interpolation [Paper] [Supp] [arXiv] [4min Video] This is the official Pytorch implementat

8 Oct 27, 2022
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

DONGJUN LEE 82 Oct 22, 2022
Research code for the paper "Variational Gibbs inference for statistical estimation from incomplete data".

Variational Gibbs inference (VGI) This repository contains the research code for Simkus, V., Rhodes, B., Gutmann, M. U., 2021. Variational Gibbs infer

Vaidotas Šimkus 1 Apr 08, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
Official PyTorch Implementation of GAN-Supervised Dense Visual Alignment

GAN-Supervised Dense Visual Alignment — Official PyTorch Implementation Paper | Project Page | Video This repo contains training, evaluation and visua

944 Jan 07, 2023
Implementation of Nalbach et al. 2017 paper.

Deep Shading Convolutional Neural Networks for Screen-Space Shading Our project is based on Nalbach et al. 2017 paper. In this project, a set of buffe

Marcel Santana 17 Sep 08, 2022
Download & Install mods for your favorit game with a few simple clicks

Husko's SteamWorkshop Downloader 🔴 IMPORTANT ❗ 🔴 The Tool is currently being rewritten so updates will be slow and only on the dev branch until it i

Husko 67 Nov 25, 2022
Code for IntraQ, PyTorch implementation of our paper under review

IntraQ: Learning Synthetic Images with Intra-Class Heterogeneity for Zero-Shot Network Quantization paper Requirements Python = 3.7.10 Pytorch == 1.7

1 Nov 19, 2021
Official implementation of the ICCV 2021 paper: "The Power of Points for Modeling Humans in Clothing".

The Power of Points for Modeling Humans in Clothing (ICCV 2021) This repository contains the official PyTorch implementation of the ICCV 2021 paper: T

Qianli Ma 158 Nov 24, 2022
AutoVideo: An Automated Video Action Recognition System

AutoVideo is a system for automated video analysis. It is developed based on D3M infrastructure, which describes machine learning with generic pipeline languages. Currently, it focuses on video actio

Data Analytics Lab at Texas A&M University 267 Dec 17, 2022
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

34 Oct 08, 2022
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"

merlot_reserve Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound" MERLOT Reserve (in submission) is a mo

Rowan Zellers 92 Dec 11, 2022
Implementation of Wasserstein adversarial attacks.

Stronger and Faster Wasserstein Adversarial Attacks Code for Stronger and Faster Wasserstein Adversarial Attacks, appeared in ICML 2020. This reposito

21 Oct 06, 2022
An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Uformer: A General U-Shaped Transformer for Image Restoration Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu Paper: https://arxiv.org/abs

Zhendong Wang 497 Dec 22, 2022
The world's simplest facial recognition api for Python and the command line

Face Recognition You can also read a translated version of this file in Chinese 简体中文版 or in Korean 한국어 or in Japanese 日本語. Recognize and manipulate fa

Adam Geitgey 46.9k Jan 03, 2023
Official tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”

Tensorflow implementation for CVPR2020 paper “Learning to Cartoonize Using White-box Cartoon Representations”.

3.7k Dec 31, 2022
Some useful blender add-ons for SMPL skeleton's poses and global translation.

Blender add-ons for SMPL skeleton's poses and trans There are two blender add-ons for SMPL skeleton's poses and trans.The first is for making an offli

犹在镜中 154 Jan 04, 2023