MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Overview

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images

Codes for the following paper:

MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images
Benjamin Attal, Selena Ling, Aaron Gokaslan, Christian Richardt, James Tompkin
ECCV 2020

High-level overview of approach.

See more at our project page.

If you use these codes, please cite:

@inproceedings{Attal:2020:ECCV,
    author    = "Benjamin Attal and Selena Ling and Aaron Gokaslan and Christian Richardt and James Tompkin",
    title     = "{MatryODShka}: Real-time {6DoF} Video View Synthesis using Multi-Sphere Images",
    booktitle = "European Conference on Computer Vision (ECCV)",
    month     = aug,
    year      = "2020",
    url       = "https://visual.cs.brown.edu/matryodshka"
}

Note that our codes are based on the code from the paper "Stereo Maginification: Learning View Synthesis using Multiplane Images" by Zhou et al. [1], and on the code from the paper "Pixel2mesh: Generating 3D Mesh Models from Single RGB Images." by Wang et al. [3]. Please also cite their work.

Setup

  • Create a conda environment from the matryodshka-gpu.yml file.
  • Run ./download_glob.sh to download the files needed for training and testing.
  • Download the dataset as in Section Replica dataset.

Training the model

See train.py for training the model.

  • To train with transform inverse regularization, use --transform_inverse_reg flag.

  • To train with CoordNet, use --coord_net flag.

  • To experiment with different losses (elpips or l2), use --which_loss flag.

    • To train with spherical weighting on loss maps, use --spherical_attention flag.
  • To train with graph convolution network (GCN), use --gcn flag. Note the particular GCN architecture definition we used is from the Pixel2Mesh repo [3].

  • The current scripts support training on Replica 360 and cubemap dataset and RealEstate10K dataset. Use --input_type to switch between these types of inputs (ODS, PP, REALESTATE_PP).

See scripts/train/*.sh for some sample scripts.

Testing the model

See test.py for testing the model with replica-360 test set.

  • When testing on video frames, e.g. test_video_640x320, include on_video in --test_type flag.
  • When testing on high-resolution images, include high_res in --test_type flag.

See scripts/test/*.sh for sample scripts.

Evaluation

See eval.py for evaluating the model, which saves the metric scores into a json file. We evaluate our models on

  • third-view reconstruction quality

    • See scripts/eval/*-reg.sh for a sample script.
  • frame-to-frame reconstruction differences on video sequences to evaluate the effect of transform inverse regularization on temporal consistency.

    • Include on_video when specifying the --eval_type flag.
    • See scripts/eval/*-video.sh for a sample script.

Pre-trained model

Download models pre-trained with and without transform inverse regularization by running ./download_model.sh. These can also be found here at the Brown library for archival purposes.

Replica dataset

We rendered a 360 and a cubemap dataset for training from the Facebook Replica Dataset [2]. This data can be found here at the Brown library for archival purposes. You should have access to the following datasets.

  • train_640x320
  • test_640x320
  • test_video_640x320

You can also find the camera pose information here that were used to render the training dataset. Each line of the txt fileach line of the txt file is formatted as below:

camera_position_x camera_position_y camera_position_z ods_baseline target1_offset_x target1_offset_y target1_offset_z target2_offset_x target2_offset_y target2_offset_z target3_offset_x target3_offset_y target3_offset_z

We also have a fork of the Replica dataset codebase which can regenerate our data from scratch. This contains customized rendering scripts that allow output of ODS, equirectangular, and cubemap projection spherical imagery, along with corresponding depth maps.

Note that the 360 dataset we release for download was rendered with an incorrect 90-degree camera rotation around the up vector and a horizontal flip. Regenerating the dataset from our released code fork with the customized rendering scripts will not include this coordinate change. The output model performance should be approximately the same.

Exporting the model to ONNX

We export our model to ONNX by firstly converting the checkpoint into a pb file, which then gets converted to an onnx file with the tf2onnx module. See export.py for exporting the model into .pb file.

See scripts/export/model-name.sh for a sample script to run export.py, and scripts/export/pb2onnx.sh for a sample script to run pb-to-onnx conversion.

Unity Application + ONNX to TensorRT Conversion

We are still working on releasing the real-time Unity application and onnx2trt conversion scripts. Please bear with us!

References

[1] Zhou, Tinghui, et al. "Stereo magnification: Learning view synthesis using multiplane images." arXiv preprint arXiv:1805.09817 (2018). https://github.com/google/stereo-magnification

[2] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019). https://github.com/facebookresearch/Replica-Dataset

[3] Wang, Nanyang, et al. "Pixel2mesh: Generating 3d mesh models from single rgb images." Proceedings of the European Conference on Computer Vision (ECCV). 2018. https://github.com/nywang16/Pixel2Mesh

Owner
Brown University Visual Computing Group
Brown University Visual Computing Group
Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn?

Domain Adaptation with Invariant RepresentationLearning: What Transformations to Learn? Repository Structure: DSAN |└───amazon |    └── dataset (Amazo

DMIRLAB 17 Jan 04, 2023
(ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning"

CLNet (ICCV 2021) PyTorch implementation of Paper "Progressive Correspondence Pruning by Consensus Learning" [project page] [paper] Citing CLNet If yo

Chen Zhao 22 Aug 26, 2022
A Python library for unevenly-spaced time series analysis

traces A Python library for unevenly-spaced time series analysis. Why? Taking measurements at irregular intervals is common, but most tools are primar

Datascope Analytics 516 Dec 29, 2022
AI-based, context-driven network device ranking

Batea A batea is a large shallow pan of wood or iron traditionally used by gold prospectors for washing sand and gravel to recover gold nuggets. Batea

Secureworks Taegis VDR 269 Nov 26, 2022
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

Minho Ryu 29 Nov 30, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
A library of multi-agent reinforcement learning components and systems

Mava: a research framework for distributed multi-agent reinforcement learning Table of Contents Overview Getting Started Supported Environments System

InstaDeep Ltd 463 Dec 23, 2022
Siamese TabNet

Raifhack-DS-2021 https://raifhack.ru/ - Команда Звёздочка Siamese TabNet Сиамская TabNet предсказывает стоимость объекта недвижимости с price_type=1,

Daniel Gafni 15 Apr 16, 2022
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

emmanuel 47 Nov 06, 2022
Large scale PTM - PPI relation extraction

Large-scale protein-protein post-translational modification extraction with distant supervision and confidence calibrated BioBERT The silver standard

1 Feb 25, 2022
Codes for TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization.

TS-CAM: Token Semantic Coupled Attention Map for Weakly SupervisedObject Localization This is the official implementaion of paper TS-CAM: Token Semant

vasgaowei 112 Jan 02, 2023
SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

SmallInitEmb LayerNorm(SmallInit(Embedding)) in a Transformer I find that when t

PENG Bo 11 Dec 25, 2022
Using a Seq2Seq RNN architecture via TensorFlow to predict future Bitcoin prices

Recurrent Bitcoin Network A Data Science Thesis Project About This repository contains the source code for implementing Bitcoin price prediciton using

Frizu 6 Sep 08, 2022
Fuzzy Overclustering (FOC)

Fuzzy Overclustering (FOC) In real-world datasets, we need consistent annotations between annotators to give a certain ground-truth label. However, in

2 Nov 08, 2022
The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

AutoSF The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding" and this paper has been accepted by ICDE2020. News:

AutoML Research 64 Dec 17, 2022
On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Zemin Liu 4 Jun 18, 2022
Official Code Implementation of the paper : XAI for Transformers: Better Explanations through Conservative Propagation

Official Code Implementation of The Paper : XAI for Transformers: Better Explanations through Conservative Propagation For the SST-2 and IMDB expermin

Ameen Ali 23 Dec 30, 2022
The official implementation of ELSA: Enhanced Local Self-Attention for Vision Transformer

ELSA: Enhanced Local Self-Attention for Vision Transformer By Jingkai Zhou, Pich

DamoCV 87 Dec 19, 2022
Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement (NeurIPS 2020)

MTTS-CAN: Multi-Task Temporal Shift Attention Networks for On-Device Contactless Vitals Measurement Paper Xin Liu, Josh Fromm, Shwetak Patel, Daniel M

Xin Liu 106 Dec 30, 2022
Code for BMVC2021 paper "Boundary Guided Context Aggregation for Semantic Segmentation"

Boundary-Guided-Context-Aggregation Boundary Guided Context Aggregation for Semantic Segmentation Haoxiang Ma, Hongyu Yang, Di Huang In BMVC'2021 Pape

Haoxiang Ma 31 Jan 08, 2023