Official source code of Fast Point Transformer, CVPR 2022

Overview

Fast Point Transformer

Project Page | Paper

This repository contains the official source code and data for our paper:

Fast Point Transformer
Chunghyun Park, Yoonwoo Jeong, Minsu Cho, and Jaesik Park
POSTECH GSAI & CSE
CVPR, 2022, New Orleans.

An Overview of the proposed pipeline

Overview

This work introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel based method, and our network achieves 129 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off in 3D semantic segmentation on S3DIS dataset.

Citation

If you find our code or paper useful, please consider citing our paper:

@inproceedings{park2022fast,
 title={{Fast Point Transformer}},
 author={Chunghyun Park and Yoonwoo Jeong and Minsu Cho and Jaesik Park},
 booktitle={Proceedings of the {IEEE/CVF} Conference on Computer Vision and Pattern Recognition (CVPR)},
 year={2022}
}

Experiments

1. S3DIS Area 5 test

We denote MinkowskiNet42 trained with this repository as MinkowskiNet42. We use voxel size 4cm for both MinkowskiNet42 and our Fast Point Transformer.

Model Latency (sec) mAcc (%) mIoU (%) Reference
PointTransformer 18.07 76.5 70.4 Codes from the authors
MinkowskiNet42 0.08 74.1 67.2 Checkpoint
  + rotation average 0.66 75.1 69.0 -
FastPointTransformer 0.14 76.6 69.2 Checkpoint
  + rotation average 1.13 77.6 71.0 -

2. ScanNetV2 validation

Model Voxel Size mAcc (%) mIoU (%) Reference
MinkowskiNet42 2cm - 72.2 Official GitHub
MinkowskiNet42 2cm 81.4 72.1 Checkpoint
FastPointTransformer 2cm 81.2 72.5 Checkpoint
MinkowskiNet42 5cm 76.3 67.0 Checkpoint
FastPointTransformer 5cm 78.9 70.0 Checkpoint
MinkowskiNet42 10cm 70.8 60.7 Checkpoint
FastPointTransformer 10cm 76.1 66.5 Checkpoint

Installation

This repository is developed and tested on

  • Ubuntu 18.04 and 20.04
  • Conda 4.11.0
  • CUDA 11.1
  • Python 3.8.13
  • PyTorch 1.7.1 and 1.10.0
  • MinkowskiEngine 0.5.4

Environment Setup

You can install the environment by using the provided shell script:

~$ git clone --recursive [email protected]:POSTECH-CVLab/FastPointTransformer.git
~$ cd FastPointTransformer
~/FastPointTransformer$ bash setup.sh fpt
~/FastPointTransformer$ conda activate fpt

Training & Evaluation

First of all, you need to download the datasets (ScanNetV2 and S3DIS), and preprocess them as:

(fpt) ~/FastPointTransformer$ python src/data/preprocess_scannet.py # you need to modify the data path
(fpt) ~/FastPointTransformer$ python src/data/preprocess_s3dis.py # you need to modify the data path

And then, locate the provided meta data of each dataset (src/data/meta_data) with the preprocessed dataset following the structure below:

${data_dir}
├── scannetv2
│   ├── meta_data
│   │   ├── scannetv2_train.txt
│   │   ├── scannetv2_val.txt
│   │   └── ...
│   └── scannet_processed
│       ├── train
│       │   ├── scene0000_00.ply
│       │   ├── scene0000_01.ply
│       │   └── ...
│       └── test
└── s3dis
    ├── meta_data
    │   ├── area1.txt
    │   ├── area2.txt
    │   └── ...
    └── s3dis_processed
        ├── Area_1
        │   ├── conferenceRoom_1.ply
        │   ├── conferenceRoom_2.ply
        │   └── ...
        ├── Area_2
        └── ...

After then, you can train and evalaute a model by using the provided python scripts (train.py and eval.py) with configuration files in the config directory. For example, you can train and evaluate Fast Point Transformer with voxel size 4cm on S3DIS dataset via the following commands:

(fpt) ~/FastPointTransformer$ python train.py config/s3dis/train_fpt.gin
(fpt) ~/FastPointTransformer$ python eval.py config/s3dis/eval_fpt.gin {checkpoint_file} # use -r option for rotation averaging.

Consistency Score

You need to generate predictions via the following command:

(fpt) ~/FastPointTransformer$ python -m src.cscore.prepare {checkpoint_file} -m {model_name} -v {voxel_size} # This takes hours.

Then, you can calculate the consistency score (CScore) with:

(fpt) ~/FastPointTransformer$ python -m src.cscore.calculate {prediction_dir} # This takes seconds.

3D Object Detection using VoteNet

Please refer this repository.

Acknowledgement

Our code is based on the MinkowskiEngine. We also thank Hengshuang Zhao for providing the code of Point Transformer. If you use our model, please consider citing them as well.

TensorFlow implementation of "Learning from Simulated and Unsupervised Images through Adversarial Training"

Simulated+Unsupervised (S+U) Learning in TensorFlow TensorFlow implementation of Learning from Simulated and Unsupervised Images through Adversarial T

Taehoon Kim 569 Dec 29, 2022
Official implementation of Rich Semantics Improve Few-Shot Learning (BMVC, 2021)

Rich Semantics Improve Few-Shot Learning Paper Link Abstract : Human learning benefits from multi-modal inputs that often appear as rich semantics (e.

Mohamed Afham 11 Jul 26, 2022
This is Official implementation for "Pose-guided Feature Disentangling for Occluded Person Re-Identification Based on Transformer" in AAAI2022

PFD:Pose-guided Feature Disentangling for Occluded Person Re-identification based on Transformer This repo is the official implementation of "Pose-gui

Tao Wang 93 Dec 18, 2022
Python KNN model: Predicting a probability of getting a work visa. Tableau: Non-immigrant visas over the years.

The value of international students to the United States. Probability of getting a non-immigrant visa. Project timeline: Jan 2021 - April 2021 Project

Zinaida Dvoskina 2 Nov 21, 2021
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI 2022)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
RealTime Emotion Recognizer for Machine Learning Study Jam's demo

Emotion recognizer Table of contents Clone project Dataset Install dependencies Main program Demo 1. Clone project git clone https://github.com/GDSC20

Google Developer Student Club - UIT 1 Oct 05, 2021
duralava is a neural network which can simulate a lava lamp in an infinite loop.

duralava duralava is a neural network which can simulate a lava lamp in an infinite loop. Example This is not a real lava lamp but a "fake" one genera

Maximilian Bachl 87 Dec 20, 2022
Alpha-Zero - Telegram Group Manager Bot Written In Python Using Pyrogram

✨ Alpha Zero Bot ✨ Telegram Group Manager Bot + Userbot Written In Python Using

1 Feb 17, 2022
Supervised & unsupervised machine-learning techniques are applied to the database of weighted P4s which admit Calabi-Yau hypersurfaces.

Weighted Projective Spaces ML Description: The database of 5-vectors describing 4d weighted projective spaces which admit Calabi-Yau hypersurfaces are

Ed Hirst 3 Sep 08, 2022
[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

MixFormer The official implementation of the CVPR 2022 paper MixFormer: End-to-End Tracking with Iterative Mixed Attention [Models and Raw results] (G

Multimedia Computing Group, Nanjing University 235 Jan 03, 2023
Pyramid Scene Parsing Network, CVPR2017.

Pyramid Scene Parsing Network by Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, details are in project page. Introduction This

Hengshuang Zhao 1.5k Jan 05, 2023
v objective diffusion inference code for JAX.

v-diffusion-jax v objective diffusion inference code for JAX, by Katherine Crowson (@RiversHaveWings) and Chainbreakers AI (@jd_pressman). The models

Katherine Crowson 186 Dec 21, 2022
List of content farm sites like g.penzai.com.

内容农场网站清单 Google 中文搜索结果包含了相当一部分的内容农场式条目,比如「小 X 知识网」「小 X 百科网」。此种链接常会 302 重定向其主站,页面内容为自动生成,大量堆叠关键字,揉杂一些爬取到的内容,完全不具可读性和参考价值。 尤为过分的是,该类网站可能有成千上万个分身域名被 Goog

WDMPA 541 Jan 03, 2023
Starter Code for VALUE benchmark

StarterCode for VALUE Benchmark This is the starter code for VALUE Benchmark [website], [paper]. This repository currently supports all baseline model

VALUE Benchmark 73 Dec 09, 2022
(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework

(Py)TOD: Tensor-based Outlier Detection, A General GPU-Accelerated Framework Background: Outlier detection (OD) is a key data mining task for identify

Yue Zhao 127 Jan 05, 2023
Read number plates with https://platerecognizer.com/

HASS-plate-recognizer Read vehicle license plates with https://platerecognizer.com/ which offers free processing of 2500 images per month. You will ne

Robin 69 Dec 30, 2022
A small library of 3D related utilities used in my research.

utils3D A small library of 3D related utilities used in my research. Installation Install via GitHub pip install git+https://github.com/Steve-Tod/util

Zhenyu Jiang 8 May 20, 2022
Drone detection using YOLOv5

This drone detection system uses YOLOv5 which is a family of object detection architectures and we have trained the model on Drone Dataset. Overview I

Tushar Sarkar 27 Dec 20, 2022
Code for paper entitled "Improving Novelty Detection using the Reconstructions of Nearest Neighbours"

NLN: Nearest-Latent-Neighbours A repository containing the implementation of the paper entitled Improving Novelty Detection using the Reconstructions

Michael (Misha) Mesarcik 4 Dec 14, 2022