MaskTrackRCNN for video instance segmentation based on mmdetection

Last update: Jan 05, 2023

Related tags

Overview

MaskTrackRCNN for video instance segmentation

Introduction

This repo serves as the official code release of the MaskTrackRCNN model for video instance segmentation described in the tech report:

@article{ Yang2019vis,
  author = {Linjie Yang and Yuchen Fan and Ning Xu},  
  title = {Video instance segmentation},
  journal = {CoRR},
  volume = {abs/1905.04804},
  year = {2019},
  url = {https://arxiv.org/abs/1905.04804}
}

In this work, a new task video instance segmentation is presented. Video instance segmentation extends the image instance segmentation task from the image domain to the video domain. The new problem aims at simultaneous detection, segmentation and tracking of object instances in videos. YouTubeVIS, a new dataset tailored for this task is collected based on the current largest video object segmentation dataset YouTubeVOS. Sample annotations of a video clip can be seen below. We also proposed an algorithm to jointly detect, segment, and track object instances in a video, named MaskTrackRCNN. A tracking head is added to the original MaskRCNN model to match objects across frames. An overview of the algorithm is shown below.

Installation

This repo is built based on mmdetection commit hash f3a939f. Please refer to INSTALL.md to install the library. You also need to install a customized COCO API for YouTubeVIS dataset. You can use following commands to create conda env with all dependencies.

conda create -n MaskTrackRCNN -y
conda activate MaskTrackRCNN
conda install -c pytorch pytorch=0.4.1 torchvision cuda92 -y
conda install -c conda-forge cudatoolkit-dev=9.2 opencv -y
conda install cython -y
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"
bash compile.sh
pip install .

You may also need to follow #1 to load MSCOCO pretrained models.

Model training and evaluation

Our model is based on MaskRCNN-resnet50-FPN. The model is trained end-to-end on YouTubeVIS based on a MSCOCO pretrained checkpoint (link).

Training

Download YouTubeVIS from here.
Symlink the train/validation dataset to $MMDETECTION/data folder. Put COCO-style annotations under $MMDETECTION/data/annotations.

mmdetection
├── mmdet
├── tools
├── configs
├── data
│   ├── train
│   ├── val
│   ├── annotations
│   │   ├── instances_train_sub.json
│   │   ├── instances_val_sub.json

Run python3 tools/train.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py to train the model. For reference to arguments such as learning rate and model parameters, please refer to configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py

Evaluation

Our pretrained model is available for download at Google Drive. Run the following command to evaluate the model on YouTubeVIS.

python3 tools/test_video.py configs/masktrack_rcnn_r50_fpn_1x_youtubevos.py [MODEL_PATH] --out [OUTPUT_PATH] --eval segm

A json file containing the predicted result will be generated as OUTPUT_PATH.json. YouTubeVIS currently only allows evaluation on the codalab server. Please upload the generated result to codalab server to see actual performances.

License

This project is released under the Apache 2.0 license.

Contact

If you have any questions regarding the repo, please contact Linjie Yang ([email protected]) or create an issue.

MaskTrackRCNN for video instance segmentation based on mmdetection

Related tags

Overview

MaskTrackRCNN for video instance segmentation

Introduction

Installation

Model training and evaluation

Training

Evaluation

License

Contact

Owner

[IROS'21] SurRoL: An Open-source Reinforcement Learning Centered and dVRK Compatible Platform for Surgical Robot Learning

Tightness-aware Evaluation Protocol for Scene Text Detection

git《Learning Pairwise Inter-Plane Relations for Piecewise Planar Reconstruction》(ECCV 2020) GitHub:

Probabilistic Tensor Decomposition of Neural Population Spiking Activity

LogAvgExp - Pytorch Implementation of LogAvgExp

DGN pymarl - Implementation of DGN on Pymarl, which could be trained by VDN or QMIX

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

CTC segmentation python package

Pure python implementations of popular ML algorithms.

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

LIMEcraft: Handcrafted superpixel selectionand inspection for Visual eXplanations

The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

SpanNER: Named EntityRe-/Recognition as Span Prediction

A two-stage U-Net for high-fidelity denoising of historical recordings

PyTorch Implementation for Fracture Detection in Wrist Bone X-ray Images

Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

Codebase of deep learning models for inferring stability of mRNA molecules

Official Codes for Graph Modularity:Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks.