Tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos"

Overview

Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos

report PWC

This repository is the official tensorflow python implementation of "Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos" in CVPR 2021 (Oral Presentation) (Best Paper Nominated).

Project Page
TikTok Dataset

Teaser Image

This codebase provides:

  • Inference code
  • Training code
  • Visualization code

Requirements

(This code is tested with tensorflow-gpu 1.14.0, Python 3.7.4, CUDA 10 (version 10.0.130) and cuDNN 7 (version 7.4.2).)

  • numpy
  • imageio
  • matplotlib
  • scikit-image
  • scipy==1.1.0
  • tensorflow-gpu==1.14.0
  • gast==0.2.2
  • Pillow

Installation

Run the following code to install all pip packages:

pip install -r requirements.txt 

In case there is a problem, you can use the following tensorflow docker container "(tensorflow:19.02-py3)":

sudo docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/tensorflow:19.02-py3

Then install the requirements:

pip install -r requirements.txt 

Inference Demo

Input:

The test data dimension should be: 256x256. For any test data you should have 3 .png files: (For an example please take a look at the demo data in "test_data" folder.)

  • name_img.png : The 256x256x3 test image
  • name_mask.png : The 256x256 corresponding mask. You can use any off-the-shelf tools such as removebg to remove the background and get the mask.
  • name_dp.png : The 256x256x3 corresponding DensePose.

Output:

Running the demo generates the following:

  • name.txt : The 256x256 predicted depth
  • name_mesh.obj : The reconstructed mesh. You can use any off-the-shelf tools such as MeshLab to visualize the mesh. Visualization for demo data from different views:

Teaser Image

  • name_normal_1.txt, name_normal_2.txt, name_normal_3.txt : Three 256x256 predicted normal. If you concatenate them in the third axis it will give you the 256x256x3 normal map.
  • name_results.png : visualization of predicted depth heatmap and the predicted normal map. Visualization for demo data:

Teaser Image

Run the demo:

Download the weights from here and extract in the main repository or run this in the main repository:

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1UOHkmwcWpwt9r11VzOCa_CVamwHVaobV' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1UOHkmwcWpwt9r11VzOCa_CVamwHVaobV" -O model.zip && rm -rf /tmp/cookies.txt

unzip model.zip

Run the following python code:

python HDNet_Inference.py

From line 26 to 29 under "test path and outpath" you can choose the input directory (default: './test_data'), ouput directory (default: './test_data/infer_out') and if you want to save the visualization (default: True).

More Results

Teaser Image

Training

To train the network, go to training folder and read the README file

MATLAB Visualization

If you want to generate visualizations similar to those on the website, go to MATLAB_Visualization folder and run

make_video.m

From lines 7 to 14, you can choose the test folder (default: test_data) and the image name to process (default: 0043). This will generate a video of the prediction from different views (default: "test_data/infer_out/video/0043/video.avi") This process will take around 2 minutes to generate 164 angles.

Note that this visualization will always generate a 672 × 512 video, You may want to resize your video accordingly for your own tested data.

Citation

If you find the code or our dataset useful in your research, please consider citing the paper.

@InProceedings{Jafarian_2021_CVPR_TikTok,
    author    = {Jafarian, Yasamin and Park, Hyun Soo},
    title     = {Learning High Fidelity Depths of Dressed Humans by Watching Social Media Dance Videos},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {12753-12762}} 
Owner
Yasamin Jafarian
PhD Candidate at the University of Minnesota.
Yasamin Jafarian
YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks

YOLTv5 rapidly detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks.

Adam Van Etten 145 Jan 01, 2023
Our implementation used for the MICCAI 2021 FLARE Challenge titled 'Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements'.

Efficient Multi-Organ Segmentation Using SpatialConfiguartion-Net with Low GPU Memory Requirements Our implementation used for the MICCAI 2021 FLARE C

Franz Thaler 3 Sep 27, 2022
Attention-guided gan for synthesizing IR images

SI-AGAN Attention-guided gan for synthesizing IR images This repository contains the Tensorflow code for "Pedestrian Gender Recognition by Style Trans

1 Oct 25, 2021
[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS).

A Strong Single-Stage Baseline for Long-Tailed Problems This project provides a strong single-stage baseline for Long-Tailed Classification (under Ima

Kaihua Tang 514 Dec 23, 2022
Pretrained models for Jax/Haiku; MobileNet, ResNet, VGG, Xception.

Pre-trained image classification models for Jax/Haiku Jax/Haiku Applications are deep learning models that are made available alongside pre-trained we

Alper Baris CELIK 14 Dec 20, 2022
A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Denoising Diffusion Probabilistic Model for Proteins Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to gen

Phil Wang 108 Nov 23, 2022
Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification Usage The required packages are lis

0 Feb 07, 2022
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

LightHuBERT LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Github | Huggingface | SUPER

WangRui 46 Dec 29, 2022
Course on computational design, non-linear optimization, and dynamics of soft systems at UIUC.

Computational Design and Dynamics of Soft Systems · This is a repository that contains the source code for generating the lecture notes, handouts, exe

Tejaswin Parthasarathy 4 Jul 21, 2022
Stochastic Extragradient: General Analysis and Improved Rates

Stochastic Extragradient: General Analysis and Improved Rates This repository is the official implementation of the paper "Stochastic Extragradient: G

Hugo Berard 4 Nov 11, 2022
Pytorch implementation of the paper Time-series Generative Adversarial Networks

TimeGAN-pytorch Pytorch implementation of the paper Time-series Generative Adversarial Networks presented at NeurIPS'19. Jinsung Yoon, Daniel Jarrett

Zhiwei ZHANG 21 Nov 24, 2022
[CoRL 2021] A robotics benchmark for cross-embodiment imitation.

x-magical x-magical is a benchmark extension of MAGICAL specifically geared towards cross-embodiment imitation. The tasks still provide the Demo/Test

Kevin Zakka 36 Nov 26, 2022
Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"

gtn_applications An applications library using GTN. Current examples include: Offline handwriting recognition Automatic speech recognition Installing

Facebook Research 68 Dec 29, 2022
This computer program provides a reference implementation of Lagrangian Monte Carlo in metric induced by the Monge patch

This computer program provides a reference implementation of Lagrangian Monte Carlo in metric induced by the Monge patch. The code was prepared to the final version of the accepted manuscript in AIST

Marcelo Hartmann 2 May 06, 2022
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Project Aquarium Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Cep

Aquarist Labs 73 Jul 21, 2022
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Voice2Series-Reprogramming Voice2Series: Reprogramming Acoustic Models for Time Series Classification International Conference on Machine Learning (IC

49 Jan 03, 2023
Official code for Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset

Official code for our Interspeech 2021 - Spoken ObjectNet: A Bias-Controlled Spoken Caption Dataset [1]*. Visually-grounded spoken language datasets c

Ian Palmer 3 Jan 26, 2022
Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Transformer Encoder Reasoning Network Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text M

Nicola Messina 53 Dec 30, 2022