RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Last update: Nov 29, 2022

Related tags

Deep Learning RTS3D

Overview

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving (AAAI2021).

RTS3D is efficiency and accuracy stereo 3D object detection method for autonomous driving.

RTS3D

Introduction

RTS3D is the first true real-time system (FPS>24) for stereo image 3D detection meanwhile achieves 10% improvement in average precision comparing with the previous state-of-the-art method. RTS3D only require RGB images without synthetic data, instance segmentation, CAD model, or depth generator.

Highlights

Fast: 33 FPS of single image test speed in KITTI benchmark with 384*1280 resolution
Accuracy: SOTA on the KITTI benchmark.
Anchor Free: No 2D or 3D anchor are reauired
Easy to deploy: RTS3D uses conventional convolution operations and MLP, so it is very easy to deploy and accelerate.

RTS3D Baseline and Model Zoo

All experiments are tested with Ubuntu 16.04, Pytorch 1.0.0, CUDA 9.0, Python 3.6, single NVIDIA 2080Ti

IoU Setting 1: Car IoU > 0.5, Pedestrian IoU > 0.25, Cyclist IoU > 0.25

IoU Setting 2: Car IoU > 0.7, Pedestrian IoU > 0.5, Cyclist IoU > 0.5

Training on KITTI train split and evaluation on val split.
- FCE Space Resolution: 10 * 10 * 10
- Model: (Google Drive), (Baidu Cloud 提取码：k4uk)

Class	Iteration	FPS	AP BEV IoU Setting1	AP 3D IoU Setting1	AP BEV IoU Setting2	AP 3D IoU Setting2
-	-	-	Easy / Moderate / Hard	Easy / Moderate / Hard	Easy / Moderate / Hard	Easy / Moderate / Hard
Car- Recall-11	1	90.9	89.83, 77.05, 68.28	89.27, 70.12, 61.17	73.20, 53.62, 46.44	60.87, 42.38, 36.44
Car- Recall-40	1	90.9	92.92, 76.17, 66.62	90.35, 71.37, 63.52	78.12, 54.75, 47.09	60.34, 39.32, 32.97
Car- Recall-11	2	45.5	90.41, 78.70, 70.03	90.26, 77.23, 68.28	76.56, 56.46, 48.20	63.65, 44.50, 37.48
Car- Recall-40	2	45.5	95.75, 79.61, 69.69	93.57, 76.64, 66.72	78.12, 54.75, 47.09	63.99, 41.78, 34.96

Training on KITTI train split and evaluation on val split.
- FCE Space Resolution: 10 * 10 * 10
- Recall split: 11
- Iteration: 2
- Model: (Google Drive), (Baidu Cloud 提取码：4t4u)

Class	AP BEV IoU Setting1	AP 3D IoU Setting1	AP BEV IoU Setting2	AP 3D IoU Setting2
-	Easy / Moderate / Hard	Easy / Moderate / Hard	Easy / Moderate / Hard	Easy / Moderate / Hard
Car	90.18, 78.46, 69.76	89.88, 76.64, 67.86	74.95, 54.07, 46.78	58.50, 39.74, 34.83
Pedestrian	57.12, 48.82, 40.88	56.36, 48.29, 40.22	32.16, 26.31, 21.28	26.95, 20.77, 19.74
Cyclist	54.48, 35.78, 30.80	53.86, 30.90, 30.52	33.59, 20.80, 20.14	31.05, 20.26, 18.93

Installation

Please refer to INSTALL.md

Dataset preparation

Please download the official KITTI 3D object detection dataset and organize the downloaded files as follows:

KM3DNet
├── kitti_format
│   ├── data
│   │   ├── kitti
│   │   |   ├── annotations
│   │   │   ├── calib /000000.txt .....
│   │   │   ├── image(left[0-7480] right[7481-14961] input augmentatiom)
│   │   │   ├── label /000000.txt .....
|   |   |   ├── train.txt val.txt trainval.txt
│   │   │   ├── mono_results /000000.txt .....
├── src
├── demo_kitti_format
├── readme
├── requirements.txt

Getting Started

Please refer to GETTING_STARTED.md to learn more usage about this project.

Acknowledgement

License

RTS3D is released under the MIT License (refer to the LICENSE file for details). Portions of the code are borrowed from, CenterNet, iou3d and kitti_eval (KITTI dataset evaluation). Please refer to the original License of these projects (See NOTICE).

Citation

If you find this project useful for your research, please use the following BibTeX entry.

@misc{2012.15072,
Author = {Peixuan Li, Shun Su, Huaici Zhao},
Title = {RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving},
Year = {2020},
Eprint = {arXiv:2012.15072},
}

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving

Related tags

Overview

RTS3D: Real-time Stereo 3D Detection from 4D Feature-Consistency Embedding Space for Autonomous Driving (AAAI2021).

Introduction

Highlights

RTS3D Baseline and Model Zoo

Installation

Dataset preparation

Getting Started

Acknowledgement

License

Citation

Owner

[ICME 2021 Oral] CORE-Text: Improving Scene Text Detection with Contrastive Relational Reasoning

Simple Baselines for Human Pose Estimation and Tracking

Distributed Deep learning with Keras & Spark

Source code for the BMVC-2021 paper "SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation".

ByteTrack超详细教程！训练自己的数据集&&摄像头实时检测跟踪

PyTorch Implementation of AnimeGANv2

Official repository for "On Improving Adversarial Transferability of Vision Transformers" (2021)

A motion detection system with RaspberryPi, OpenCV, Python

Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding

MEND: Model Editing Networks using Gradient Decomposition

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

Rendering Point Clouds with Compute Shaders

TabNet for fastai

An open source machine learning library for performing regression tasks using RVM technique.

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

The code for 'Deep Residual Fourier Transformation for Single Image Deblurring'

Record radiologists' eye gaze when they are labeling images.

The codebase for our paper "Generative Occupancy Fields for 3D Surface-Aware Image Synthesis" (NeurIPS 2021)

Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.