MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Last update: Jan 04, 2023

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

If you find our work useful for your research, please consider citing our paper:

@article{DBLP:journals/corr/abs-2104-13325,
  author    = {Zhenpei Yang and
               Zhile Ren and
               Qi Shan and
               Qixing Huang},
  title     = {{MVS2D:} Efficient Multi-view Stereo via Attention-Driven 2D Convolutions},
  journal   = {CoRR},
  volume    = {abs/2104.13325},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.13325},
  eprinttype = {arXiv},
  eprint    = {2104.13325},
  timestamp = {Tue, 04 May 2021 15:12:43 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-13325.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

✏️ Changelog

Nov 27 2021

Initial release. Note that our released code achieve improved results than those reported in the initial arxiv pre-print. In addition, we include the evaluation on DTU dataset. We will update our paper soon.

⚙️ Installation

Click to expand

The code is tested with CUDA10.1. Please use following commands to install dependencies:

conda create --name mvs2d python=3.7
conda activate mvs2d

pip install -r requirements.txt

The folder structure should looks like the following if you have downloaded all data and pretrained models. Download links are inside each dataset tab at the end of this README.

.
├── configs
├── datasets
├── demo
├── networks
├── scripts
├── pretrained_model
│   ├── demon
│   ├── dtu
│   └── scannet
├── data
│   ├── DeMoN
│   ├── DTU_hr
│   ├── SampleSet
│   ├── ScanNet
│   └── ScanNet_3_frame_jitter_pose.npy
├── splits
│   ├── DeMoN_samples_test_2_frame.npy
│   ├── DeMoN_samples_train_2_frame.npy
│   ├── ScanNet_3_frame_test.npy
│   ├── ScanNet_3_frame_train.npy
│   └── ScanNet_3_frame_val.npy

🎬 Demo

Click to expand

After downloading the pretrained models for ScanNet, try to run following command to make a prediction on a sample data.

python demo.py --cfg configs/scannet/release.conf

The results are saved as demo.png

⏳ Training & Testing

We use 4 Nvidia V100 GPU for training. You may need to modify 'CUDA_VISIBLE_DEVICES' and batch size to accomodate your GPU resources.

ScanNet

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗 noisy pose 🔗

Training

First download and extract ScanNet training data and split. Then run following command to train our model.

bash scripts/scannet/train.sh

To train the multi-scale attention model, add --robust 1 to the training command in scripts/scannet/train.sh.

To train our model with noisy input pose, add --perturb_pose 1 to the training command in scripts/scannet/train.sh.

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/scannet/test.sh

You should get something like these:

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.059	0.016	0.026	0.157	0.084	0.964	0.995	0.999	0.108	0.079	0.856	0.974	0.996

SUN3D/RGBD/Scenes11

Click to expand

Download

data 🔗 split 🔗 pretrained models 🔗

Training

First download and extract DeMoN training data and split. Then run following command to train our model.

bash scripts/demon/train.sh

Testing

First download and extract data, split and pretrained models.

Then run:

bash scripts/demon/test.sh

You should get something like these:

dataset rgbd: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.082	0.165	0.047	0.440	0.147	0.921	0.939	0.948	0.325	0.284	0.753	0.894	0.933

dataset scenes11: 256

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.046	0.080	0.018	0.439	0.107	0.976	0.989	0.993	0.155	0.058	0.822	0.945	0.979

dataset sun3d: 160

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.099	0.055	0.044	0.304	0.137	0.893	0.970	0.993	0.224	0.171	0.649	0.890	0.969

-> Done!

depth

abs_rel	sq_rel	log10	rmse	rmse_log	a1	a2	a3	abs_diff	abs_diff_median	thre1	thre3	thre5
0.071	0.096	0.033	0.402	0.127	0.938	0.970	0.981	0.222	0.152	0.755	0.915	0.963

DTU

Click to expand

Download

data 🔗 eval data 🔗 pretrained models 🔗

Training

First download and extract DTU training data. Then run following command to train our model.

bash scripts/dtu/test.sh

Testing

First download and extract DTU eval data and pretrained models.

The following command performs three steps together: 1. Generate depth prediction on DTU test set. 2. Fuse depth predictions into final point cloud. 3. Evaluate predicted point cloud. Note that we re-implement the original Matlab Evaluation of DTU dataset using python.

bash scripts/dtu/test.sh

You should get something like these:

Acc 0.4051747996189477
Comp 0.2776021161518006
F-score 0.34138845788537414

Acknowledgement

The fusion code for DTU dataset is heavily built upon from PatchMatchNet

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Related tags

Overview

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

Project Page | Paper

✏️ Changelog

Nov 27 2021

⚙️ Installation

🎬 Demo

⏳ Training & Testing

ScanNet

Download

Training

Testing

SUN3D/RGBD/Scenes11

Download

Training

Testing

DTU

Download

Training

Testing

Acknowledgement

Owner

UT-Sarulab MOS prediction system using SSL models

Python implementation of O-OFDMNet, a deep learning-based optical OFDM system,

SBINN: Systems-biology informed neural network

This is a Python Module For Encryption, Hashing And Other stuff

Anomaly detection related books, papers, videos, and toolboxes

Pytorch Implementation for CVPR2018 Paper: Learning to Compare: Relation Network for Few-Shot Learning

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

Anchor-free Oriented Proposal Generator for Object Detection

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

Pytorch Implementation of Spiking Neural Networks Calibration, ICML 2021

TeachMyAgent is a testbed platform for Automatic Curriculum Learning methods in Deep RL.

Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

Tianshou - An elegant PyTorch deep reinforcement learning library.

Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Multi-objective gym environments for reinforcement learning.

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

An end-to-end framework for mixed-integer optimization with data-driven learned constraints.

Leaf: Multiple-Choice Question Generation

Dynamic Slimmable Network (CVPR 2021, Oral)

Server files for UltimateLabeling