Voxel Transformer for 3D object detection

Last update: Dec 25, 2022

Related tags

Deep Learning VOTR

Overview

Voxel Transformer

This is a reproduced repo of Voxel Transformer for 3D object detection.

The code is mainly based on OpenPCDet.

Introduction

We provide code and training configurations of VoTr-SSD/TSD on the KITTI and Waymo Open dataset. Checkpoints will not be released.

Important Notes: VoTr generally requires quite a long time (more than 60 epochs on Waymo) to converge, and a large GPU memory (32Gb) is needed for reproduction. Please strictly follow the instructions and train with sufficient number of epochs. If you don't have a 32G GPU, you can decrease the attention SIZE parameters in yaml files, but this may possibly harm the performance.

Requirements

The codes are tested in the following environment:

Ubuntu 18.04
Python 3.6
PyTorch 1.5
CUDA 10.1
OpenPCDet v0.3.0
spconv v1.2.1

Installation

a. Clone this repository.

git clone https://github.com/PointsCoder/VOTR.git

b. Install the dependent libraries as follows:

Install the dependent python libraries:

pip install -r requirements.txt

Install the SparseConv library, we use the implementation from [spconv].
- If you use PyTorch 1.1, then make sure you install the spconv v1.0 with (commit 8da6f96) instead of the latest one.
- If you use PyTorch 1.3+, then you need to install the spconv v1.2. As mentioned by the author of spconv, you need to use their docker if you use PyTorch 1.4+.

c. Compile CUDA operators by running the following command:

python setup.py develop

Training

All the models are trained with Tesla V100 GPUs (32G). The KITTI config of votr_ssd is for training with a single GPU. Other configs are for training with 8 GPUs. If you use different number of GPUs for training, it's necessary to change the respective training epochs to attain a decent performance.

The performance of VoTr is quite unstable on KITTI. If you cannnot reproduce the results, remember to run it multiple times.

models

# votr_ssd.yaml: single-stage votr backbone replacing the spconv backbone
# votr_tsd.yaml: two-stage votr with pv-head

training votr_ssd on kitti

CUDA_VISIBLE_DEVICES=0 python train.py --cfg_file cfgs/kitti_models/votr_ssd.yaml

training other models

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_train.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml

testing

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 sh scripts/dist_test.sh 8 --cfg_file cfgs/waymo_models/votr_tsd.yaml --eval_all

Citation

If you find this project useful in your research, please consider cite:

@article{mao2021voxel,
  title={Voxel Transformer for 3D Object Detection},
  author={Mao, Jiageng and Xue, Yujing and Niu, Minzhe and others},
  journal={ICCV},
  year={2021}
}

Voxel Transformer for 3D object detection

Related tags

Overview

Voxel Transformer

Introduction

Requirements

Installation

Training

Citation

Owner

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera

Code for paper PairRE: Knowledge Graph Embeddings via Paired Relation Vectors.

TrackFormer: Multi-Object Tracking with Transformers

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation (ACM MM 2020)

Code base for NeurIPS 2021 publication titled Kernel Functional Optimisation (KFO)

Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Python with OpenCV - MediaPip Framework Hand Detection

N-gram models- Unsmoothed, Laplace, Deleted Interpolation

Pretraining Representations For Data-Efficient Reinforcement Learning

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

This repository contains source code for the Situated Interactive Language Grounding (SILG) benchmark

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Running Google MoveNet Multipose Tracking models on OpenVINO.

Reproduces ResNet-V3 with pytorch

Lava-DL, but with PyTorch-Lightning flavour

EasyMocap is an open-source toolbox for markerless human motion capture from RGB videos.

Apache Flink

Source code for Fixed-Point GAN for Cloud Detection

Official pytorch implement for “Transformer-Based Source-Free Domain Adaptation”