2D Human Pose estimation using transformers. Implementation in Pytorch

Overview

PE-former: Pose Estimation Transformer

Vision transformer architectures perform very well for image classification tasks. Efforts to solve more challenging vision tasks with transformers rely on convolutional backbones for feature extraction.

POTR is a pure transformer architecture (no CNN backbone) for 2D body pose estimation. It uses an encoder-decoder architecture with a vision transformer as an encoder and a transformer decoder (derived from DETR).

You can use the code in this repository to train and evaluate different POTR configurations on the COCO dataset.

Model

POTR is based on building blocks derived from recent SOTA models. As shown in the figure there are two major components: A Visual Transformer encoder, and a Transformer decoder.

model

The input image is initially converted into tokens following the ViT paradigm. A position embedding is used to help retain the patch-location information. The tokens and the position embedding are used as input to transformer encoder. The transformed tokens are used as the memory input of the transformer decoder. The inputs of the decoder are M learned queries. For each query the network will produce a joint prediction. The output tokens from the transformer decoder are passed through two heads (FFNs).

  • The first is a classification head used to predict the joint type (i.e class) of each query.
  • The second is a regression head that predicts the normalized coordinates (in the range [0,1]) of the joint in the input image.

Predictions that do not correspond to joints are mapped to a "no object" class.

Acknowledgements

The code in this repository is based on the following:

Thank you!

Preparing

Create a python venv and install all the dependencies:

python -m venv pyenv
source pyenv/bin/activate
pip install -r requirements.txt

Training

Here are some CLI examples using the lit_main.py script.

Training POTR with a deit_small encoder, patch size of 16x16 pixels and input resolution 192x256:

python lit_main.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24

POTR with Xcit_small_p16 encoder:

 python lit_main.py --vit_arch xcit_small_12_p16 --batch_size 42 --input_size 288 384 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24   --vit_weights https://dl.fbaipublicfiles.com/xcit/xcit_small_12_p16_384_dist.pth

POTR with the ViT as Backbone (VAB) configuration:

 python lit_main.py --vit_as_backbone --vit_arch resnet50 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --position_embedding learned_nocls --num_workers 16 --num_queries 100 --dim_feedforward 1536 --accumulate_grad_batches 1

Baseline that uses a resnet50 (pretrained with dino) as an encoder:

 python lit_main.py --vit_arch resnet50 --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --gpus 1 --num_workers 24 --vit_weights https://dl.fbaipublicfiles.com/dino/dino_resnet50_pretrain/dino_resnet50_pretrain.pth --position_embedding learned_nocls

Check the lit_main.py cli arguments for a complete list.

python lit_main.py --help

Evaluation

Evaluate a trained model using the evaluate.py script.

For example to evaluate POTR with an xcit_small_12_p8 encoder:

python evaluate.py --vit_arch xcit_small_12_p8 --patch_size 8 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384  --position_embedding enc_xcit --num_workers 16 --num_queries 100 --dim_feedforward 1536 --init_weights paper_experiments/xcit_small12_p8_dino_192_256_paper/checkpoints/checkpoint-epoch\=065-AP\=0.736.ckpt --use_det_bbox

Evaluate POTR with a deit_small encoder:

 python evaluate.py --vit_arch deit_deit_small --patch_size 16 --batch_size 42 --input_size 192 256 --hidden_dim 384 --vit_dim 384 --num_workers 24 --init_weights lightning_logs/version_0/checkpoints/checkpoint-epoch\=074-AP\=0.622.ckpt  --use_det_bbox

Set the argument of --init_weights to your model's checkpoint.

Model Zoo

name input params AP AR url
POTR-Deit-dino-p8 192x256 36.4M 70.6 78.1 model
POTR-Xcit-p16 288x384 40.6M 70.2 77.4 model
POTR-Xcit-dino-p16 288x384 40.6M 70.7 77.9 model
POTR-Xcit-dino-p8 192x256 40.5M 71.6 78.7 model
POTR-Xcit-dino-p8 288x384 40.5M 72.6 79.4 model

Check the experiments folder for configuration files and evaluation results.

All trained models and tensorboard training logs can be downloaded from this drive folder.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Owner
Panteleris Paschalis
Panteleris Paschalis
Segmentation models with pretrained backbones. PyTorch.

Python library with Neural Networks for Image Segmentation based on PyTorch. The main features of this library are: High level API (just two lines to

Pavel Yakubovskiy 6.6k Jan 06, 2023
PyTorch implementation of popular datasets and models in remote sensing

PyTorch Remote Sensing (torchrs) (WIP) PyTorch implementation of popular datasets and models in remote sensing tasks (Change Detection, Image Super Re

isaac 222 Dec 28, 2022
StorSeismic: An approach to pre-train a neural network to store seismic data features

StorSeismic: An approach to pre-train a neural network to store seismic data features This repository contains codes and resources to reproduce experi

Seismic Wave Analysis Group 11 Dec 05, 2022
Rasterize with the least efforts for researchers.

utils3d Rasterize and do image-based 3D transforms with the least efforts for researchers. Based on numpy and OpenGL. It could be helpful when you wan

Ruicheng Wang 8 Dec 15, 2022
Combinatorially Hard Games where the levels are procedurally generated

puzzlegen Implementation of two procedurally simulated environments with gym interfaces. IceSlider: the agent needs to reach and stop on the pink squa

Autonomous Learning Group 3 Jun 26, 2022
Train neural network for semantic segmentation (deep lab V3) with pytorch in less then 50 lines of code

Train neural network for semantic segmentation (deep lab V3) with pytorch in 50 lines of code Train net semantic segmentation net using Trans10K datas

17 Dec 19, 2022
MNIST, but with Bezier curves instead of pixels

bezier-mnist This is a work-in-progress vector version of the MNIST dataset. Samples Here are some samples from the training set. Note that, while the

Alex Nichol 15 Jan 16, 2022
Official implementation of EfficientPose

EfficientPose This is the official implementation of EfficientPose. We based our work on the Keras EfficientDet implementation xuannianz/EfficientDet

2 May 17, 2022
Hierarchical Metadata-Aware Document Categorization under Weak Supervision (WSDM'21)

Hierarchical Metadata-Aware Document Categorization under Weak Supervision This project provides a weakly supervised framework for hierarchical metada

Yu Zhang 53 Sep 17, 2022
Sdf sparse conv - Deep Learning on SDF for Classifying Brain Biomarkers

Deep Learning on SDF for Classifying Brain Biomarkers To reproduce the results f

1 Jan 25, 2022
Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs

Implementation for the paper: Probabilistic Entity Representation Model for Reasoning over Knowledge Graphs, Nurendra Choudhary, Nikhil Rao, Sumeet Ka

Nurendra Choudhary 8 Nov 15, 2022
Solution to the Weather4cast 2021 challenge

This code was used for the entry by the team "antfugue" for the Weather4cast 2021 Challenge. Below, you can find the instructions for generating predi

Jussi Leinonen 13 Jan 03, 2023
Stock-Prediction - prediction of stock market movements using sentiment analysis and deep learning.

Stock-Prediction- In this project, we aim to enhance the prediction of stock market movements using sentiment analysis and deep learning. We divide th

5 Jan 25, 2022
Unofficial implementation of MUSIQ (Multi-Scale Image Quality Transformer)

MUSIQ: Multi-Scale Image Quality Transformer Unofficial pytorch implementation of the paper "MUSIQ: Multi-Scale Image Quality Transformer" (paper link

41 Jan 02, 2023
Experiments with Fourier layers on simulation data.

Factorized Fourier Neural Operators This repository contains the code to reproduce the results in our NeurIPS 2021 ML4PS workshop paper, Factorized Fo

Alasdair Tran 57 Dec 25, 2022
Detect roadway lanes using Python OpenCV for project during the 5th semester at DHBW Stuttgart for lecture in digital image processing.

Find Line Detection (Image Processing) Identifying lanes of the road is very common task that human driver performs. It's important to keep the vehicl

LMF 4 Jun 21, 2022
Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

Introduction ImagePy is an open source image processing framework written in Python. Its UI interface, image data structure and table data structure a

ImagePy 1.2k Dec 29, 2022
Convolutional 2D Knowledge Graph Embeddings resources

ConvE Convolutional 2D Knowledge Graph Embeddings resources. Paper: Convolutional 2D Knowledge Graph Embeddings Used in the paper, but do not use thes

Tim Dettmers 586 Dec 24, 2022
Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

RecycleD Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN

Yunan Zhu 23 Nov 05, 2022
Automate issue discovery for your projects against Lightning nightly and releases.

Automated Testing for Lightning EcoSystem Projects Automate issue discovery for your projects against Lightning nightly and releases. You get CPUs, Mu

Pytorch Lightning 41 Dec 24, 2022