Deep High-Resolution Representation Learning for Human Pose Estimation

Overview

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019)

News

Introduction

This is an official pytorch implementation of Deep High-Resolution Representation Learning for Human Pose Estimation. In this work, we are interested in the human pose estimation problem with a focus on learning reliable high-resolution representations. Most existing methods recover high-resolution representations from low-resolution representations produced by a high-to-low resolution network. Instead, our proposed network maintains high-resolution representations through the whole process. We start from a high-resolution subnetwork as the first stage, gradually add high-to-low resolution subnetworks one by one to form more stages, and connect the mutli-resolution subnetworks in parallel. We conduct repeated multi-scale fusions such that each of the high-to-low resolution representations receives information from other parallel representations over and over, leading to rich high-resolution representations. As a result, the predicted keypoint heatmap is potentially more accurate and spatially more precise. We empirically demonstrate the effectiveness of our network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

Illustrating the architecture of the proposed HRNet

Main Results

Results on MPII val

Arch Head Shoulder Elbow Wrist Hip Knee Ankle Mean [email protected]
pose_resnet_50 96.4 95.3 89.0 83.2 88.4 84.0 79.6 88.5 34.0
pose_resnet_101 96.9 95.9 89.5 84.4 88.4 84.5 80.7 89.1 34.0
pose_resnet_152 97.0 95.9 90.0 85.0 89.2 85.3 81.3 89.6 35.0
pose_hrnet_w32 97.1 95.9 90.3 86.4 89.1 87.1 83.3 90.3 37.7

Note:

Results on COCO val2017 with detector having human AP of 56.4 on COCO val2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_resnet_50 256x192 34.0M 8.9 0.704 0.886 0.783 0.671 0.772 0.763 0.929 0.834 0.721 0.824
pose_resnet_50 384x288 34.0M 20.0 0.722 0.893 0.789 0.681 0.797 0.776 0.932 0.838 0.728 0.846
pose_resnet_101 256x192 53.0M 12.4 0.714 0.893 0.793 0.681 0.781 0.771 0.934 0.840 0.730 0.832
pose_resnet_101 384x288 53.0M 27.9 0.736 0.896 0.803 0.699 0.811 0.791 0.936 0.851 0.745 0.858
pose_resnet_152 256x192 68.6M 15.7 0.720 0.893 0.798 0.687 0.789 0.778 0.934 0.846 0.736 0.839
pose_resnet_152 384x288 68.6M 35.3 0.743 0.896 0.811 0.705 0.816 0.797 0.937 0.858 0.751 0.863
pose_hrnet_w32 256x192 28.5M 7.1 0.744 0.905 0.819 0.708 0.810 0.798 0.942 0.865 0.757 0.858
pose_hrnet_w32 384x288 28.5M 16.0 0.758 0.906 0.825 0.720 0.827 0.809 0.943 0.869 0.767 0.871
pose_hrnet_w48 256x192 63.6M 14.6 0.751 0.906 0.822 0.715 0.818 0.804 0.943 0.867 0.762 0.864
pose_hrnet_w48 384x288 63.6M 32.9 0.763 0.908 0.829 0.723 0.834 0.812 0.942 0.871 0.767 0.876

Note:

Results on COCO test-dev2017 with detector having human AP of 60.9 on COCO test-dev2017 dataset

Arch Input size #Params GFLOPs AP Ap .5 AP .75 AP (M) AP (L) AR AR .5 AR .75 AR (M) AR (L)
pose_resnet_152 384x288 68.6M 35.3 0.737 0.919 0.828 0.713 0.800 0.790 0.952 0.856 0.748 0.849
pose_hrnet_w48 384x288 63.6M 32.9 0.755 0.925 0.833 0.719 0.815 0.805 0.957 0.874 0.763 0.863
pose_hrnet_w48* 384x288 63.6M 32.9 0.770 0.927 0.845 0.734 0.831 0.820 0.960 0.886 0.778 0.877

Note:

Environment

The code is developed using python 3.6 on Ubuntu 16.04. NVIDIA GPUs are needed. The code is developed and tested using 4 NVIDIA P100 GPU cards. Other platforms or GPU cards are not fully tested.

Quick start

Installation

  1. Install pytorch >= v1.0.0 following official instruction. Note that if you use pytorch's version < v1.0.0, you should following the instruction at https://github.com/Microsoft/human-pose-estimation.pytorch to disable cudnn's implementations of BatchNorm layer. We encourage you to use higher pytorch's version(>=v1.0.0)

  2. Clone this repo, and we'll call the directory that you cloned as ${POSE_ROOT}.

  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Make libs:

    cd ${POSE_ROOT}/lib
    make
    
  5. Install COCOAPI:

    # COCOAPI=/path/to/clone/cocoapi
    git clone https://github.com/cocodataset/cocoapi.git $COCOAPI
    cd $COCOAPI/PythonAPI
    # Install into global site-packages
    make install
    # Alternatively, if you do not have permissions or prefer
    # not to install the COCO API into global site-packages
    python3 setup.py install --user
    

    Note that instructions like # COCOAPI=/path/to/install/cocoapi indicate that you should pick a path where you'd like to have the software cloned and then set an environment variable (COCOAPI in this case) accordingly.

  6. Init output(training model output directory) and log(tensorboard log directory) directory:

    mkdir output 
    mkdir log
    

    Your directory tree should look like this:

    ${POSE_ROOT}
    ├── data
    ├── experiments
    ├── lib
    ├── log
    ├── models
    ├── output
    ├── tools 
    ├── README.md
    └── requirements.txt
    
  7. Download pretrained models from our model zoo(GoogleDrive or OneDrive)

    ${POSE_ROOT}
     `-- models
         `-- pytorch
             |-- imagenet
             |   |-- hrnet_w32-36af842e.pth
             |   |-- hrnet_w48-8ef0771d.pth
             |   |-- resnet50-19c8e357.pth
             |   |-- resnet101-5d3b4d8f.pth
             |   `-- resnet152-b121ed2d.pth
             |-- pose_coco
             |   |-- pose_hrnet_w32_256x192.pth
             |   |-- pose_hrnet_w32_384x288.pth
             |   |-- pose_hrnet_w48_256x192.pth
             |   |-- pose_hrnet_w48_384x288.pth
             |   |-- pose_resnet_101_256x192.pth
             |   |-- pose_resnet_101_384x288.pth
             |   |-- pose_resnet_152_256x192.pth
             |   |-- pose_resnet_152_384x288.pth
             |   |-- pose_resnet_50_256x192.pth
             |   `-- pose_resnet_50_384x288.pth
             `-- pose_mpii
                 |-- pose_hrnet_w32_256x256.pth
                 |-- pose_hrnet_w48_256x256.pth
                 |-- pose_resnet_101_256x256.pth
                 |-- pose_resnet_152_256x256.pth
                 `-- pose_resnet_50_256x256.pth
    
    

Data preparation

For MPII data, please download from MPII Human Pose Dataset. The original annotation files are in matlab format. We have converted them into json format, you also need to download them from OneDrive or GoogleDrive. Extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- mpii
    `-- |-- annot
        |   |-- gt_valid.mat
        |   |-- test.json
        |   |-- train.json
        |   |-- trainval.json
        |   `-- valid.json
        `-- images
            |-- 000001163.jpg
            |-- 000003072.jpg

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. We also provide person detection result of COCO val2017 and test-dev2017 to reproduce our multi-person pose estimation results. Please download from OneDrive or GoogleDrive. Download and extract them under {POSE_ROOT}/data, and make them look like this:

${POSE_ROOT}
|-- data
`-- |-- coco
    `-- |-- annotations
        |   |-- person_keypoints_train2017.json
        |   `-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        |   |-- COCO_test-dev2017_detections_AP_H_609_person.json
        `-- images
            |-- train2017
            |   |-- 000000000009.jpg
            |   |-- 000000000025.jpg
            |   |-- 000000000030.jpg
            |   |-- ... 
            `-- val2017
                |-- 000000000139.jpg
                |-- 000000000285.jpg
                |-- 000000000632.jpg
                |-- ... 

Training and Testing

Testing on MPII dataset using model zoo's models(GoogleDrive or OneDrive)

python tools/test.py \
    --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_mpii/pose_hrnet_w32_256x256.pth

Training on MPII dataset

python tools/train.py \
    --cfg experiments/mpii/hrnet/w32_256x256_adam_lr1e-3.yaml

Testing on COCO val2017 dataset using model zoo's models(GoogleDrive or OneDrive)

python tools/test.py \
    --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_coco/pose_hrnet_w32_256x192.pth \
    TEST.USE_GT_BBOX False

Training on COCO train2017 dataset

python tools/train.py \
    --cfg experiments/coco/hrnet/w32_256x192_adam_lr1e-3.yaml \

Other applications

Many other dense prediction tasks, such as segmentation, face alignment and object detection, etc. have been benefited by HRNet. More information can be found at Deep High-Resolution Representation Learning.

Citation

If you use our code or models in your research, please cite with:

@inproceedings{sun2019deep,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Sun, Ke and Xiao, Bin and Liu, Dong and Wang, Jingdong},
  booktitle={CVPR},
  year={2019}
}

@inproceedings{xiao2018simple,
    author={Xiao, Bin and Wu, Haiping and Wei, Yichen},
    title={Simple Baselines for Human Pose Estimation and Tracking},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2018}
}
Owner
HRNet
Code for pose estimation is available at https://github.com/leoxiaobin/deep-high-resolution-net.pytorch
HRNet
This is the latest version of the PULP SDK

PULP-SDK This is the latest version of the PULP SDK, which is under active development. The previous (now legacy) version, which is no longer supporte

78 Dec 07, 2022
Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

Rishikesh (ऋषिकेश) 103 Nov 25, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
Cluttered MNIST Dataset

Cluttered MNIST Dataset A setup script will download MNIST and produce mnist/*.t7 files: luajit download_mnist.lua Example usage: local mnist_clutter

DeepMind 50 Jul 12, 2022
Official implementation of Densely connected normalizing flows

Densely connected normalizing flows This repository is the official implementation of NeurIPS 2021 paper Densely connected normalizing flows. Poster a

Matej Grcić 31 Dec 12, 2022
All supplementary material used by me while TA-ing CS3244: Machine Learning

CS3244-Tutorial-Material All supplementary material used by me while TA-ing CS3244: Machine Learning at NUS School of Computing. What is this? I teach

Rishabh Anand 18 Sep 23, 2022
FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

FastCover: A Self-Supervised Learning Framework for Multi-Hop Influence Maximization in Social Networks by Anonymous.

0 Apr 02, 2021
D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos This repository contains the implementation for "D²Conv3D: Dynamic Dilated Co

17 Oct 20, 2022
Library for converting from RGB / GrayScale image to base64 and back.

Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b

Vladimir Iglovikov 16 Aug 28, 2022
SwinTrack: A Simple and Strong Baseline for Transformer Tracking

SwinTrack This is the official repo for SwinTrack. A Simple and Strong Baseline Prerequisites Environment conda (recommended) conda create -y -n SwinT

LitingLin 196 Jan 04, 2023
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022
PyTorch implementation of "VRT: A Video Restoration Transformer"

VRT: A Video Restoration Transformer Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool Computer

Jingyun Liang 837 Jan 09, 2023
Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)

Geometrically Adaptive Dictionary Attack on Face Recognition This is the Pytorch code of our paper "Geometrically Adaptive Dictionary Attack on Face R

6 Nov 21, 2022
Degree-Quant: Quantization-Aware Training for Graph Neural Networks.

Degree-Quant This repo provides a clean re-implementation of the code associated with the paper Degree-Quant: Quantization-Aware Training for Graph Ne

35 Oct 07, 2022
AVD Quickstart Containerlab

AVD Quickstart Containerlab WARNING This repository is still under construction. It's fully functional, but has number of limitations. For example: RE

Carl Buchmann 3 Apr 10, 2022
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022
Repo for CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning

CReST in Tensorflow 2 Code for the paper: "CReST: A Class-Rebalancing Self-Training Framework for Imbalanced Semi-Supervised Learning" by Chen Wei, Ki

Google Research 75 Nov 01, 2022
Pseudo-Visual Speech Denoising

Pseudo-Visual Speech Denoising This code is for our paper titled: Visual Speech Enhancement Without A Real Visual Stream published at WACV 2021. Autho

Sindhu 94 Oct 22, 2022
This is the dataset for testing the robustness of various VO/VIO methods

KAIST VIO dataset This is the dataset for testing the robustness of various VO/VIO methods You can download the whole dataset on KAIST VIO dataset Ind

1 Sep 01, 2022