Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Last update: Dec 25, 2022

Related tags

Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Code release of our paper ROCA. Check out our video, paper, and website!

If you find our paper or this repository helpful, please cite:

@article{gumeli2022roca,
  title={ROCA: Robust CAD Model Retrieval and Alignment from a Single Image},
  author={G{\"u}meli, Can and Dai, Angela and Nie{\ss}ner, Matthias},
  booktitle={Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
  year={2022}
}

Development Environment

We use the following development environment for this project:

Nvidia RTX 3090 GPU
Intel Xeon W-1370
Ubuntu 20.04
CUDA Version 11.2
cudatoolkit 11.0
Pytorch 1.7
Pytorch3D 0.5 or 0.6
Detectron2 0.3

Installation

This code is developed using anaconda3 with Python 3.8 (download here), therefore we recommend a similar setup.

You can simply run the following code in the command line to create the development environment:

$ source setup.sh

For visualizing some demo results or using the data preprocessing code, you need our custom rasterizer. In case the provided x86-64 linux shared object does not work for you, you may install the rasterizer here.

Running the Demo

We provide four sample input images in network/assets folder. The images are captured with a smartphone and then preprocessed to be compatible with ROCA format. To run the demo, you first need to download data and config from this Google Drive folder. Models folder contains the pre-trained model and used config, while Data folder contains images and dataset.

Assuming contents of the Models directory are in $MODEL_DIR and contents of the Data directory are in $DATA_DIR, you can run:

$ cd network
$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml

You will see image overlay and CAD visualization are displayed one by one. Open3D mesh visualization is an interactive window where you can see geometries from different viewpoints. Close the Open3D window to continue to the next visualization. You will see similar results to the image above.

For headless visualization, you can specify an output directory where resulting images and meshes are placed:

$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml --output_dir $OUTPUT_DIR

You may use the --wild option to visualize results with "wild retrieval". Note that we omit the table category in this case due to large size diversity.

Preparing Data

Downloading Processed Data (Recommended)

We provide preprocessed images and labels in this Google Drive folder. Download and extract all folders to a desired location before running the training and evaluation code.

Rendering Data

Alternatively, you can render data yourself. Our data preparation code lives in the renderer folder.

Our project depends on ShapeNet (Chang et al., '15), ScanNet (Dai et al. '16), and Scan2CAD (Avetisyan et al. '18) datasets. For ScanNet, we use ScanNet25k images which are provided as a zip file via the ScanNet download script.

Once you get the data, check renderer/env.sh file for the locations of different datasets. The meanings of environment variables are described as inline comments in env.sh.

After editing renderer/env.sh, run the data generation script:

$ cd renderer
$ sh run.sh

Please check run.sh to see how individual scripts are running for data preprocessing and feel free to customize the data pipeline!

Training and Evaluating Models

Our training code lives in the network directory. Navigate to the network/env.sh and edit the environment variables. Make sure data directories are consistent with the ones locations downloaded and extracted folders. If you manually prepared data, make sure locations in /network/env.sh are consistent with the variables set in renderer/env.sh.

After you are done with network/env.sh, run the run.sh script to train a new model or evaluate an existing model based on the environment variables you set in env.sh:

$ cd network
$ sh run.sh

Replicating Experiments from the Main Paper

Based on the configurations in network/env.sh, you can run different ablations from the paper. The default config will run the (final) experiment. You can do the following edits cumulatively for different experiments:

For P+E+W+R, set RETRIEVAL_MODE=resnet_resnet+image
For P+E+W, set RETRIEVAL_MODE=nearest
For P+E, set NOC_WEIGHTS=0
For P, set E2E=0

Resources

To get the datasets and gain further insight regarding our implementation, we refer to the following datasets and open-source codebases:

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Related tags

Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Development Environment

Installation

Running the Demo

Preparing Data

Downloading Processed Data (Recommended)

Rendering Data

Training and Evaluating Models

Replicating Experiments from the Main Paper

Resources

Datasets and Metadata

Libraries

Projects

Owner

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約

Localization of thoracic abnormalities model based on VinBigData (top 1%)

CNN+LSTM+CTC based OCR implemented using tensorflow.

An expandable and scalable OCR pipeline

Volume Control using OpenCV

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Rubik's Cube in pygame with OpenGL

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Links to awesome OCR projects

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Convert Text-to Handwriting Using Python

A curated list of promising OCR resources

Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

A novel region proposal network for more general object detection ( including scene text detection ).

Shape Detection - It's a shape detection project with OpenCV and Python.

Image Smoothing and Blurring Using OpenCV

A python screen recorder for low-end computers, provides high quality video output.

Optical character recognition for Japanese text, with the main focus being Japanese manga

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Related tags

Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Development Environment

Installation

Running the Demo

Preparing Data

Downloading Processed Data (Recommended)

Rendering Data

Training and Evaluating Models

Replicating Experiments from the Main Paper

Resources

Datasets and Metadata

Libraries

Projects

Owner

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Localization of thoracic abnormalities model based on VinBigData (top 1%)

CNN+LSTM+CTC based OCR implemented using tensorflow.

An expandable and scalable OCR pipeline

Volume Control using OpenCV

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Rubik's Cube in pygame with OpenGL

Repository of conference publications and source code for first-/ second-authored papers published at NeurIPS, ICML, and ICLR.

Links to awesome OCR projects

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

Convert Text-to Handwriting Using Python

A curated list of promising OCR resources

Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

A novel region proposal network for more general object detection ( including scene text detection ).

Shape Detection - It's a shape detection project with OpenCV and Python.

Image Smoothing and Blurring Using OpenCV

A python screen recorder for low-end computers, provides high quality video output.

Optical character recognition for Japanese text, with the main focus being Japanese manga

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集シーンテキストの位置認識と識別のための論文リソースの要約