This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Overview

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation

This repo is the official implementation of "DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation". [Paper] [Project]

Update

  • Clean version is released! It currently includes code, data, log and models for the following tasks:
  • 2D human pose estimation
  • 3D human pose estimation
  • Body recovery via a SMPL model

TODO

  • Provide different sample interval checkpoints/logs
  • Add DeciWatch in MMHuman3D

Description

This paper proposes a simple baseline framework for video-based 2D/3D human pose estimation that can achieve 10 times efficiency improvement over existing works without any performance degradation, named DeciWatch. Unlike current solutions that estimate each frame in a video, DeciWatch introduces a simple yet effective sample-denoise-recover framework that only watches sparsely sampled frames, taking advantage of the continuity of human motions and the lightweight pose representation. Specifically, DeciWatch uniformly samples less than 10% video frames for detailed estimation, denoises the estimated 2D/3D poses with an efficient Transformer architecture, and then accurately recovers the rest of the frames using another Transformer-based network. Comprehensive experimental results on three video-based human pose estimation, body mesh recovery tasks and efficient labeling in videos with four datasets validate the efficiency and effectiveness of DeciWatch.

Getting Started

Environment Requirement

DeciWatch has been implemented and tested on Pytorch 1.10.1 with python >= 3.6. It supports both GPU and CPU inference.

Clone the repo:

git clone https://github.com/cure-lab/DeciWatch.git

We recommend you install the requirements using conda:

# conda
source scripts/install_conda.sh

Prepare Data

All the data used in our experiment can be downloaded here.

Google Drive

Baidu Netdisk

Valid data includes:

Dataset Pose Estimator 3D Pose 2D Pose SMPL
Sub-JHMDB SimplePose
3DPW EFT
3DPW PARE
3DPW SPIN
Human3.6M FCN
AIST++ SPIN

Please refer to doc/data.md for detailed data information and data preparing.

Training

Run the commands below to start training:

python train.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python train.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Note that the training and testing datasets should be downloaded and prepared before training.

You may refer to doc/training.md for more training details.

Evaluation

Results on 2D Pose

Dataset Estimator PCK 0.05 (INPUT/OUTPUT) PCK 0.1 (INPUT/OUTPUT) PCK 0.2 (INPUT/OUTPUT) Download
Sub-JHMDB simplepose 57.30%/79.32% 81.61%/94.27% 93.94%/98.85% Baidu Netdisk / Google Drive

Results on 3D Pose

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) Download
3DPW SPIN 96.92/93.34 34.68/7.06 Baidu Netdisk / Google Drive
3DPW EFT 90.34/89.02 32.83/6.84 Baidu Netdisk / Google Drive
3DPW PARE 78.98/77.16 25.75/6.90 Baidu Netdisk / Google Drive
AIST++ SPIN 107.26/71.27 33.37/5.68 Baidu Netdisk / Google Drive
Human3.6M FCN 54.56/52.83 19.18/1.47 Baidu Netdisk / Google Drive

Results on SMPL

Dataset Estimator MPJPE (INPUT/OUTPUT) Accel (INPUT/OUTPUT) MPVPE (INPUT/OUTPUT) Download
3DPW SPIN 100.13/97.53 35.53/8.38 114.39/112.84 Baidu Netdisk / Google Drive
3DPW EFT 91.60/92.56 33.57/8.7 5 110.34/109.27 Baidu Netdisk / Google Drive
3DPW PARE 80.44/81.76 26.77/7.24 94.88/95.68 Baidu Netdisk / Google Drive
AIST++ SPIN 108.25/82.10 33.83/7.27 137.51/106.08 Baidu Netdisk / Google Drive

Noted that although our main contribution is the efficiency improvement, using DeciWatch as post processing is also helpful for accuracy and smoothness improvement.

You may refer to doc/evaluate.md for evaluate details.

Quick Demo

Run the commands below to visualize demo:

python demo.py --cfg [config file] --dataset_name [dataset name] --estimator [backbone estimator you use] --body_representation [smpl/3D/2D] --sample_interval [sample interval N]

You are supposed to put corresponding images with the data structure:

|-- data
    |-- videos
        |-- pw3d 
            |-- downtown_enterShop_00
                |-- image_00000.jpg
                |-- ...
            |-- ...
        |-- jhmdb
            |-- catch
            |-- ...
        |-- aist
            |-- gWA_sFM_c01_d27_mWA2_ch21.mp4
            |-- ...
        |-- ...

For example, you can train on 3D representation of 3DPW using backbone estimator SPIN with sample interval 10 by:

python demo.py --cfg configs/config_pw3d_spin.yaml --dataset_name pw3d --estimator spin --body_representation 3D --sample_interval 10

Please refer to the dataset website for the raw images. You may change the config in lib/core/config.py for different visualization parameters.

You may refer to doc/visualize.md for visualization details.

Citing DeciWatch

If you find this repository useful for your work, please consider citing it as follows:

@article{zeng2022deciwatch,
  title={DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation},
  author={Zeng, Ailing and Ju, Xuan and Yang, Lei and Gao, Ruiyuan and Zhu, Xizhou and Dai, Bo and Xu, Qiang},
  journal={arXiv preprint arXiv:2203.08713},
  year={2022}
}

Please remember to cite all the datasets and backbone estimators if you use them in your experiments.

Acknowledgement

Many thanks to Xuan Ju for her great efforts to clean almost the original code!!!

License

This code is available for non-commercial scientific research purposes as defined in the LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party datasets and software are subject to their respective licenses.

STRIVE: Scene Text Replacement In Videos

STRIVE: Scene Text Replacement In Videos Dataset Types: RoboText SynthText RealWorld videos RoboText : Videos of texts collected using navigation robo

15 Jul 11, 2022
SpiroMask: Measuring Lung Function Using Consumer-Grade Masks

SpiroMask: Measuring Lung Function Using Consumer-Grade Masks Anonymised repository for paper submitted for peer review at ACM HEALTH (October 2021).

0 May 10, 2022
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Wenxuan Zhou 146 Nov 29, 2022
Code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition"

Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition This repository contains code for the CVPR2021 paper "Patch-NetV

QVPR 368 Jan 06, 2023
FewBit — a library for memory efficient training of large neural networks

FewBit FewBit — a library for memory efficient training of large neural networks. Its efficiency originates from storage optimizations applied to back

24 Oct 22, 2022
Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL)

LUPerson-NL Large-Scale Pre-training for Person Re-identification with Noisy Labels (LUPerson-NL) The repository is for our CVPR2022 paper Large-Scale

43 Dec 26, 2022
The Deep Learning with Julia book, using Flux.jl.

Deep Learning with Julia DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and specifically the

Logan Kilpatrick 67 Dec 25, 2022
BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data

FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data, a relatively complete set of integrated multi-source data download terminal software fast is developed. The softw

ChangChuntao 23 Dec 31, 2022
Pytorch ImageNet1k Loader with Bounding Boxes.

ImageNet 1K Bounding Boxes For some experiments, you might wanna pass only the background of imagenet images vs passing only the foreground. Here, I'v

Amin Ghiasi 11 Oct 15, 2022
Local Multi-Head Channel Self-Attention for FER2013

LHC-Net Local Multi-Head Channel Self-Attention This repository is intended to provide a quick implementation of the LHC-Net and to replicate the resu

12 Jan 04, 2023
A visualisation tool for Deep Reinforcement Learning

DRLVIS - Visualising Deep Reinforcement Learning Created by Marios Sirtmatsis with the support of Alex Bäuerle. DRLVis is an application used for visu

Marios Sirtmatsis 1 Nov 04, 2021
Housing Price Prediction

This project aim was to predict the price of houses in the Boston area during the great financial crisis through regression, as well as classify houses into different quality categories according to

Florian Klement 1 Jan 27, 2022
Yoloxkeypointsegment - An anchor-free version of YOLO, with a simpler design but better performance

Introduction 关键点版本:已完成 全景分割版本:已完成 实例分割版本:已完成 YOLOX is an anchor-free version of

23 Oct 20, 2022
Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation

Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation Introduction 📋 Official implementation of Explainable Robust Learnin

JeongEun Park 6 Apr 19, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
A Simulated Optimal Intrusion Response Game

Optimal Intrusion Response An OpenAI Gym interface to a MDP/Markov Game model for optimal intrusion response of a realistic infrastructure simulated u

Kim Hammar 10 Dec 09, 2022
Fuzzing JavaScript Engines with Aspect-preserving Mutation

DIE Repository for "Fuzzing JavaScript Engines with Aspect-preserving Mutation" (in S&P'20). You can check the paper for technical details. Environmen

gts3.org (<a href=[email protected])"> 190 Dec 11, 2022
Implementation of "With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition, BMVC, 2021" in PyTorch

Multimodal Temporal Context Network (MTCN) This repository implements the model proposed in the paper: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani,

Evangelos Kazakos 13 Nov 24, 2022
Tensorflow 2 Object Detection API kurulumu, GPU desteği, custom model hazırlama

Tensorflow 2 Object Detection API Bu tutorial, TensorFlow 2.x'in kararlı sürümü olan TensorFlow 2.3'ye yöneliktir. Bu, görüntülerde / videoda nesne a

46 Nov 20, 2022