Official Pytorch implementation of RePOSE (ICCV2021)

Related tags

Deep LearningRePOSE
Overview

RePOSE: Iterative Rendering and Refinement for 6D Object Detection (ICCV2021) [Link]

overview

Abstract

We present RePOSE, a fast iterative refinement method for 6D object pose estimation. Prior methods perform refinement by feeding zoomed-in input and rendered RGB images into a CNN and directly regressing an update of a refined pose. Their runtime is slow due to the computational cost of CNN, which is especially prominent in multiple-object pose refinement. To overcome this problem, RePOSE leverages image rendering for fast feature extraction using a 3D model with a learnable texture. We call this deep texture rendering, which uses a shallow multi-layer perceptron to directly regress a view-invariant image representation of an object. Furthermore, we utilize differentiable Levenberg-Marquardt (LM) optimization to refine a pose fast and accurately by minimizing the feature-metric error between the input and rendered image representations without the need of zooming in. These image representations are trained such that differentiable LM optimization converges within few iterations. Consequently, RePOSE runs at 92 FPS and achieves state-of-the-art accuracy of 51.6% on the Occlusion LineMOD dataset - a 4.1% absolute improvement over the prior art, and comparable result on the YCB-Video dataset with a much faster runtime.

Prerequisites

  • Python >= 3.6
  • Pytorch == 1.9.0
  • Torchvision == 0.10.0
  • CUDA == 10.1

Downloads

Installation

  1. Set up the python environment:
    $ pip install torch==1.9.0 torchvision==0.10.0
    $ pip install Cython==0.29.17
    $ sudo apt-get install libglfw3-dev libglfw3
    $ pip install -r requirements.txt
    
    # Install Differentiable Renderer
    $ cd renderer
    $ python3 setup.py install
    
  2. Compile cuda extensions under lib/csrc:
    ROOT=/path/to/RePOSE
    cd $ROOT/lib/csrc
    export CUDA_HOME="/usr/local/cuda-10.1"
    cd ../ransac_voting
    python setup.py build_ext --inplace
    cd ../camera_jacobian
    python setup.py build_ext --inplace
    cd ../nn
    python setup.py build_ext --inplace
    cd ../fps
    python setup.py
    
  3. Set up datasets:
    $ ROOT=/path/to/RePOSE
    $ cd $ROOT/data
    
    $ ln -s /path/to/linemod linemod
    $ ln -s /path/to/linemod_orig linemod_orig
    $ ln -s /path/to/occlusion_linemod occlusion_linemod
    
    $ cd $ROOT/data/model/
    $ unzip pretrained_models.zip
    
    $ cd $ROOT/cache/LinemodTest
    $ unzip ape.zip benchvise.zip .... phone.zip
    $ cd $ROOT/cache/LinemodOccTest
    $ unzip ape.zip can.zip .... holepuncher.zip
    

Testing

We have 13 categories (ape, benchvise, cam, can, cat, driller, duck, eggbox, glue, holepuncher, iron, lamp, phone) on the LineMOD dataset and 8 categories (ape, can, cat, driller, duck, eggbox, glue, holepuncher) on the Occlusion LineMOD dataset. Please choose the one category you like (replace ape with another category) and perform testing.

Evaluate the ADD(-S) score

  1. Generate the annotation data:
    python run.py --type linemod cls_type ape model ape
    
  2. Test:
    # Test on the LineMOD dataset
    $ python run.py --type evaluate --cfg_file configs/linemod.yaml cls_type ape model ape
    
    # Test on the Occlusion LineMOD dataset
    $ python run.py --type evaluate --cfg_file configs/linemod.yaml test.dataset LinemodOccTest cls_type ape model ape
    

Visualization

  1. Generate the annotation data:
    python run.py --type linemod cls_type ape model ape
    
  2. Visualize:
    # Visualize the results of the LineMOD dataset
    python run.py --type visualize --cfg_file configs/linemod.yaml cls_type ape model ape
    
    # Visualize the results of the Occlusion LineMOD dataset
    python run.py --type visualize --cfg_file configs/linemod.yaml test.dataset LinemodOccTest cls_type ape model ape
    

Citation

@InProceedings{Iwase_2021_ICCV,
    author    = {Iwase, Shun and Liu, Xingyu and Khirodkar, Rawal and Yokota, Rio and Kitani, Kris M.},
    title     = {RePOSE: Fast 6D Object Pose Refinement via Deep Texture Rendering},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {3303-3312}
}

Acknowledgement

Our code is largely based on clean-pvnet and our rendering code is based on neural_renderer. Thank you so much for making these codes publicly available!

Contact

If you have any questions about the paper and implementation, please feel free to email me ([email protected])! Thank you!

Owner
Shun Iwase
Carnegie Mellon University, Robotics Institute
Shun Iwase
PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentation.

Shape-aware Convolutional Layer (ShapeConv) PyTorch implementation of ShapeConv: Shape-aware Convolutional Layer for RGB-D Indoor Semantic Segmentatio

Hanchao Leng 82 Dec 29, 2022
Makes patches from huge resolution .svs slide files using openslide

openslide_patcher Makes patches from huge resolution .svs slide files using openslide Example collage I made from outputs:

2 Dec 23, 2021
Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Focal Transformer This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transf

Microsoft 486 Dec 20, 2022
A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 đŸ€–

PyMouseBot If you're from GT and annoyed with SGVPN idle timeouts while working on development laptop, You might find this useful. A python cli bot to

Oaker Min 6 Oct 24, 2022
TensorRT examples (Jetson, Python/C++)(object detection)

TensorRT examples (Jetson, Python/C++)(object detection)

Nobuo Tsukamoto 53 Dec 22, 2022
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 08, 2022
PyTorch implementation of Munchausen Reinforcement Learning based on DQN and SAC. Handles discrete and continuous action spaces

Exploring Munchausen Reinforcement Learning This is the project repository of my team in the "Advanced Deep Learning for Robotics" course at TUM. Our

Mohamed Amine Ketata 10 Mar 10, 2022
《Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching》(CVPR 2020)

This contains the codes for cross-view geo-localization method described in: Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching, CVPR2020.

41 Oct 27, 2022
[ICCV' 21] "Unsupervised Point Cloud Pre-training via Occlusion Completion"

OcCo: Unsupervised Point Cloud Pre-training via Occlusion Completion This repository is the official implementation of paper: "Unsupervised Point Clou

Hanchen 204 Dec 24, 2022
VGGVox models for Speaker Identification and Verification trained on the VoxCeleb (1 & 2) datasets

VGGVox models for speaker identification and verification This directory contains code to import and evaluate the speaker identification and verificat

338 Dec 27, 2022
Full Resolution Residual Networks for Semantic Image Segmentation

Full-Resolution Residual Networks (FRRN) This repository contains code to train and qualitatively evaluate Full-Resolution Residual Networks (FRRNs) a

Toby Pohlen 274 Oct 27, 2022
ColossalAI-Examples - Examples of training models with hybrid parallelism using ColossalAI

ColossalAI-Examples This repository contains examples of training models with Co

HPC-AI Tech 185 Jan 09, 2023
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
Yet another video caption

Yet another video caption

Fan Zhimin 5 May 26, 2022
ML From Scratch

ML from Scratch MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Clustering K Nearest Neighbours Decision

Tanishq Gautam 66 Nov 02, 2022
A little software to generate and save Julia or Mandelbrot's Fractals.

Julia-Mandelbrot-s-Fractals A little software to generate and save Julia or Mandelbrot's Fractals. Dependencies : Python 3.7 or more. (Also possible t

Olivier 0 Jul 09, 2022
Transformer part of 12th place solution in Riiid! Answer Correctness Prediction

kaggle_riiid Transformer part of 12th place solution in Riiid! Answer Correctness Prediction. Please see here for more information. Execution You need

Sakami Kosuke 2 Apr 23, 2022
An Evaluation of Generative Adversarial Networks for Collaborative Filtering.

An Evaluation of Generative Adversarial Networks for Collaborative Filtering. This repository was developed by Fernando B. Pérez Maurera. Fernando is

Fernando Benjamín PÉREZ MAURERA 0 Jan 19, 2022
BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022