ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Overview

ESTDepth: Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks (CVPR 2021)

Project Page | Video | Paper | Data

We present a novel method for multi-view depth estimation from a single video, which is a critical task in various applications, such as perception, reconstruction and robot navigation. Although previous learning-based methods have demonstrated compelling results, most works estimate depth maps of individual video frames independently, without taking into consideration the strong geometric and temporal coherence among the frames. Moreover, current state-of-the-art (SOTA) models mostly adopt a fully 3D convolution network for cost regularization and therefore require high computational cost, thus limiting their deployment in real-world applications. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer to explicitly associate geometric and temporal correlation with multiple estimated depth maps. Furthermore, to reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network consisting of a 2D context-aware network and a 3D matching network which learn 2D context information and 3D disparity cues separately.

Here is the official repo for the paper:

Table of contents



Requirements and Installation

This code is implemented in PyTorch.

The code has been tested on the following system:

To install, first clone this repo and install all dependencies:

conda env create -f environment.yml

Option: install apex to enable synchronized batch normalization

git clone https://github.com/NVIDIA/apex.git
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset

Please also cite the original papers if you use any of them in your work.

Dataset Notes on Dataset Split
ScanNet see ./data/scannet_split/
7scenes see ./data/7scenes/test.txt

Train a new model

In the training stage, our model takes a sequence of 5 frames as input, with a batch size of 4 sequences on 4 GPUs. We use the following code to train a model:

python -m torch.distributed.launch --nproc_per_node=4 train_hybrid.py  --using_apex  --sync_bn \
--datapath /userhome/35/xxlong/dataset/scannet_whole/  \
--testdatapath /userhome/35/xxlong/dataset/scannet_test/ \
--reloadscan True \
--batch_size 1 --seq_len 5 --mode train --summary_freq 10 \
--epochs 7 --lr 0.00004 --lrepochs 2,4,6,8:2 \
--logdir ./logs/hybrid_res50_ndepths64 \
--resnet 50 --ndepths 64 --IF_EST_transformer False \
--depth_min 0.1 --depth_max 10. |  tee -a ./logs/hybrid_res50_ndepths64/log.txt
bash train_hybrid.sh

Evaluation

Once the model is trained, the following command is used to evaluate test images given the trained_model.

Our model has two testing modes: Joint and ESTM

For Joint mode, run:

bash eval_hybrid.sh

For ESTM mode, run:

bash eval_hybrid_seq.sh

License

ESTDepth is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as

@InProceedings{Long_2021_CVPR,
    author    = {Long, Xiaoxiao and Liu, Lingjie and Li, Wei and Theobalt, Christian and Wang, Wenping},
    title     = {Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {8258-8267}
Owner
Xiaoxiao Long Ph.D. student in HKU
A PyTorch port of the Neural 3D Mesh Renderer

Neural 3D Mesh Renderer (CVPR 2018) This repo contains a PyTorch implementation of the paper Neural 3D Mesh Renderer by Hiroharu Kato, Yoshitaka Ushik

Daniilidis Group University of Pennsylvania 1k Jan 09, 2023
Making self-supervised learning work on molecules by using their 3D geometry to pre-train GNNs. Implemented in DGL and Pytorch Geometric.

3D Infomax improves GNNs for Molecular Property Prediction Video | Paper We pre-train GNNs to understand the geometry of molecules given only their 2D

Hannes Stärk 95 Dec 30, 2022
View model summaries in PyTorch!

torchinfo (formerly torch-summary) Torchinfo provides information complementary to what is provided by print(your_model) in PyTorch, similar to Tensor

Tyler Yep 1.5k Jan 05, 2023
A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations.

IllustrationGAN A simple, clean TensorFlow implementation of Generative Adversarial Networks with a focus on modeling illustrations. Generated Images

268 Nov 27, 2022
Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Music Classification: Beyond Supervised Learning, Towards Real-world Applications

104 Dec 15, 2022
A PyTorch re-implementation of the paper 'Exploring Simple Siamese Representation Learning'. Reproduced the 67.8% Top1 Acc on ImageNet.

Exploring simple siamese representation learning This is a PyTorch re-implementation of the SimSiam paper on ImageNet dataset. The results match that

Taojiannan Yang 72 Nov 09, 2022
DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation

DCSAU-Net: A Deeper and More Compact Split-Attention U-Net for Medical Image Segmentation By Qing Xu, Wenting Duan and Na He Requirements pytorch==1.1

Qing Xu 20 Dec 09, 2022
Numenta published papers code and data

Numenta research papers code and data This repository contains reproducible code for selected Numenta papers. It is currently under construction and w

Numenta 293 Jan 06, 2023
NVIDIA container runtime

nvidia-container-runtime A modified version of runc adding a custom pre-start hook to all containers. If environment variable NVIDIA_VISIBLE_DEVICES i

NVIDIA Corporation 938 Jan 06, 2023
OverFeat is a Convolutional Network-based image classifier and feature extractor.

OverFeat OverFeat is a Convolutional Network-based image classifier and feature extractor. OverFeat was trained on the ImageNet dataset and participat

593 Dec 08, 2022
Fully Automatic Page Turning on Real Scores

Fully Automatic Page Turning on Real Scores This repository contains the corresponding code for our extended abstract Henkel F., Schwaiger S. and Widm

Florian Henkel 7 Jan 02, 2022
A higher performance pytorch implementation of DeepLab V3 Plus(DeepLab v3+)

A Higher Performance Pytorch Implementation of DeepLab V3 Plus Introduction This repo is an (re-)implementation of Encoder-Decoder with Atrous Separab

linhua 326 Nov 22, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

QAHOI QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper) Requirements PyTorch = 1.5.1 torchvision = 0.6.1 pip install -r requ

38 Dec 29, 2022
(NeurIPS 2020) Wasserstein Distances for Stereo Disparity Estimation

Wasserstein Distances for Stereo Disparity Estimation Accepted in NeurIPS 2020 as Spotlight. [Project Page] Wasserstein Distances for Stereo Disparity

Divyansh Garg 92 Dec 12, 2022
The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter

FAPIS The official implementation of the CVPR 2021 paper FAPIS: a Few-shot Anchor-free Part-based Instance Segmenter Introduction This repo is primari

Khoi Nguyen 8 Dec 11, 2022
Picasso: A CUDA-based Library for Deep Learning over 3D Meshes

The Picasso Library is intended for complex real-world applications with large-scale surfaces, while it also performs impressively on the small-scale applications over synthetic shape manifolds. We h

97 Dec 01, 2022
Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Simple PyTorch implementations of Badnets on MNIST and CIFAR10.

Vera 75 Dec 13, 2022
python library for invisible image watermark (blind image watermark)

invisible-watermark invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink image waterm

Shield Mountain 572 Jan 07, 2023