Embracing Single Stride 3D Object Detector with Sparse Transformer

Last update: Dec 28, 2022

Related tags

Deep Learning SST

Overview

SST: Single-stride Sparse Transformer

This is the official implementation of paper:

Embracing Single Stride 3D Object Detector with Sparse Transformer

Authors: Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang

Paper Link (Check again on Monday)

Introduction and Highlights

SST is a single-stride network, which maintains original feature resolution from the beginning to the end of the network. Due to the characterisric of single stride, SST achieves exciting performances on small object detection (Pedestrian, Cyclist).
For simplicity, except for backbone, SST is almost the same with the basic PointPillars in MMDetection3D. With such a basic setting, SST achieves state-of-the-art performance in Pedestrian and Cyclist and outperforms PointPillars more than 10 AP only at a cost of 1.5x latency.
SST consists of 6 Regional Sparse Attention (SRA) blocks, which deal with the sparse voxel set. It's similar to Submanifold Sparse Convolution (SSC), but much more powerful than SSC. It's locality and sparsity guarantee the efficiency in the single stride setting.
The SRA can also be used in many other task to process sparse point clouds. Our implementation of SRA only relies on the pure Python APIs in PyTorch without engineering efforts as taken in the CUDA implementation of sparse convolution.
Large room for further improvements. For example, second stage, anchor-free head, IoU scores and advanced techniques from ViT, etc.

Usage

PyTorch >= 1.9 is highly recommended for a better support of the checkpoint technique.

Our immplementation is based on MMDetection3D, so just follow their getting_started and simply run the script: run.sh. Then you will get a basic results of SST after 5~7 hours (depends on your devices).

We only provide the single-stage model here, as for our two-stage models, please follow LiDAR-RCNN. It's also a good choice to apply other powerful second stage detectors to our single-stage SST.

Main results

Single-stage Model (based on PointPillars) on Waymo validation split

	#Sweeps	Veh_L1	Ped_L1	Cyc_L1
SST_1f	1	73.57	80.01	70.72
SST_3f	3	75.16	83.24	75.96

Note that we train the 3 classes together, so the performance above is a little bit lower than that reported in our paper.

TODO

Build SRA block with similar API as Sparse Convolution for more convenient usage.

Acknowlegement

This project is based on the following codebases.

Embracing Single Stride 3D Object Detector with Sparse Transformer

Related tags

Overview

SST: Single-stride Sparse Transformer

Introduction and Highlights

Usage

Main results

Single-stage Model (based on PointPillars) on Waymo validation split

TODO

Acknowlegement

Owner

TuSimple

Python implementation of Bayesian optimization over permutation spaces.

Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

disentanglement_lib is an open-source library for research on learning disentangled representations.

Implicit Model Specialization through DAG-based Decentralized Federated Learning

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

SuRE Evaluation: A Supplementary Material

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Machine learning library for fast and efficient Gaussian mixture models

Bayesian Neural Networks in PyTorch

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Machine Unlearning with SISA

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

A Moonraker plug-in for real-time compensation of frame thermal expansion

Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Embracing Single Stride 3D Object Detector with Sparse Transformer

Related tags

Overview

SST: Single-stride Sparse Transformer

Introduction and Highlights

Usage

Main results

Single-stage Model (based on PointPillars) on Waymo validation split

TODO

Acknowlegement

Owner

TuSimple

Python implementation of Bayesian optimization over permutation spaces.

Point detection through multi-instance deep heatmap regression for sutures in endoscopy

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音 合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

Repositorio de los Laboratorios de Análisis Numérico / Análisis Numérico I de FAMAF, UNC.

disentanglement_lib is an open-source library for research on learning disentangled representations.

Implicit Model Specialization through DAG-based Decentralized Federated Learning

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

SuRE Evaluation: A Supplementary Material

ZeroGen: Efficient Zero-shot Learning via Dataset Generation

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Machine learning library for fast and efficient Gaussian mixture models

Bayesian Neural Networks in PyTorch

A unofficial pytorch implementation of PAN(PSENet2): Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Machine Unlearning with SISA

Implementing a simplified copy of Shazam application from scratch using MinHashing and LSH.

A Moonraker plug-in for real-time compensation of frame thermal expansion

Pytorch implement of 'Unmixing based PAN guided fusion network for hyperspectral imagery'

A Pytorch implementation of "Manifold Matching via Deep Metric Learning for Generative Modeling" (ICCV 2021)

A very simple baseline to estimate 2D & 3D SMPL-compatible keypoints from a single color image.

Chinese Mandarin tts text-to-speech 中文 (普通话) 语音合成 , by fastspeech 2 , implemented in pytorch, using waveglow as vocoder,