Official code repository for the work: "The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement"

Last update: Dec 14, 2022

Related tags

Deep Learning HNDR

Overview

Handheld Multi-Frame Neural Depth Refinement

This is the official code repository for the work: The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement .

If you use parts of this work, or otherwise take inspiration from it, please considering citing our paper:

@article{chugunov2021implicit,
  title={The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement},
  author={Chugunov, Ilya and Zhang, Yuxuan and Xia, Zhihao and Zhang, Cecilia and Chen, Jiawen and Heide, Felix},
  journal={arXiv preprint arXiv:2111.13738},
  year={2021}
}

Requirements:

Developed using PyTorch 1.10.0 on Linux x64 machine
Condensed package requirements are in \requirements.txt. Note that this contains the package versions at the time of publishing, if you update to, for example, a newer version of PyTorch you will need to watch out for changes in class/function calls

Data:

Download data from this Google Drive link and unpack into the \data folder
Each folder corresponds to a scene [castle, eagle, elephant, frog, ganesha, gourd, rocks, thinker] and contains four files.
- model.pt is the frozen, trained MLP corresponding to the scene
- frame_bundle.npz is the recorded bundle data (images, depth, and poses)
- reprojected_lidar.npy is the merged LiDAR depth baseline as described in the paper
- snapshot.mp4 is a video of the recorded snapshot for visualization purposes

An explanation of the format and contents of the frame bundles (frame_bundle.npz) is given in an interactive format in \0_data_format.ipynb. We recommend you go through this jupyter notebook before you record your own bundles or otherwise manipulate the data.

Project Structure:

HNDR
  ├── checkpoints  
  │   └── // folder for network checkpoints
  ├── data  
  │   └── // folder for recorded bundle data
  ├── utils  
  │   ├── dataloader.py  // dataloader class for bundle data
  │   ├── neural_blocks.py  // MLP blocks and positional encoding
  │   └── utils.py  // miscellaneous helper functions (e.g. grid/patch sample)
  ├── 0_data_format.ipynb  // interactive tutorial for understanding bundle data
  ├── 1_reconstruction.ipynb  // interactive tutorial for depth reconstruction
  ├── model.py  // the learned implicit depth model
  │             // -> reproject points, query MLP for offsets, visualization
  ├── README.md  // a README in the README, how meta
  ├── requirements.txt  // frozen package requirements
  ├── train.py  // wrapper class for arg parsing and setting up training loop
  └── train.sh  // example script to run training

Reconstruction:

The jupyter notebook \1_reconstruction.ipynb contains an interactive tutorial for depth reconstruction: loading a model, loading a bundle, generating depth.

Training:

The script \train.sh demonstrates a basic call of \train.py to train a model on the gourd scene data. It contains the arguments

checkpoint_path - path to save model and tensorboard checkpoints
device - device for training [cpu, cuda]
bundle_path - path to the bundle data

For other training arguments, see the argument parser section of \train.py.

Best of luck,
Ilya

Official code repository for the work: "The Implicit Values of A Good Hand Shake: Handheld Multi-Frame Neural Depth Refinement"

Related tags

Overview

Handheld Multi-Frame Neural Depth Refinement

Requirements:

Data:

Project Structure:

Reconstruction:

Training:

Owner

Sparse-dense operators implementation for Paddle

Model-based reinforcement learning in TensorFlow

DeepAL: Deep Active Learning in Python

The code for our CVPR paper PISE: Person Image Synthesis and Editing with Decoupled GAN, Project Page, supp.

[CVPR 2020] 3D Photography using Context-aware Layered Depth Inpainting

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Implementation for the paper: Invertible Denoising Network: A Light Solution for Real Noise Removal (CVPR2021).

Cross-media Structured Common Space for Multimedia Event Extraction (ACL2020)

Lowest memory consumption and second shortest runtime in NTIRE 2022 challenge on Efficient Super-Resolution

This repository contains several image-to-image translation models, whcih were tested for RGB to NIR image generation. The models are Pix2Pix, Pix2PixHD, CycleGAN and PointWise.

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

Human Dynamics from Monocular Video with Dynamic Camera Movements

cisip-FIRe - Fast Image Retrieval

Fairness Metrics: All you need to know

Vehicle speed detection with python

Fast and Easy Infinite Neural Networks in Python

Exploring Machine Learning Models for detecting anomalous behavior in credit-card transactions. It's crucial that credit-card companies are able to recognize fraudulent activity so that customers are not charged for items they didn't purchase.

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference