[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

Related tags

Deep LearningVisTR
Overview

VisTR: End-to-End Video Instance Segmentation with Transformers

This is the official implementation of the VisTR paper:

Installation

We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/Epiphqny/vistr.git

Then, install PyTorch 1.6 and torchvision 0.7:

conda install pytorch==1.6.0 torchvision==0.7.0

Install pycocotools

conda install cython scipy
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"

Compile DCN module(requires GCC>=5.3, cuda>=10.0)

cd models/dcn
python setup.py build_ext --inplace

Preparation

Download and extract 2019 version of YoutubeVIS train and val images with annotations from CodeLab or YoutubeVIS. We expect the directory structure to be the following:

VisTR
├── data
│   ├── train
│   ├── val
│   ├── annotations
│   │   ├── instances_train_sub.json
│   │   ├── instances_val_sub.json
├── models
...

Download the pretrained DETR models on COCO and save it to the pretrained path.

Training

Training of the model requires at least 32g memory GPU, we performed the experiment on 32g V100 card.

To train baseline VisTR on a single node with 8 gpus for 16 epochs, run:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --backbone resnet101/50 --ytvos_path /path/to/ytvos --masks --pretrained_weights /path/to/pretrained_path

Inference

python inference.py --masks --model_path /path/to/model_weights --save_path /path/to/results.json

Models

We provide baseline VisTR models, and plan to include more in future. AP is computed on YouTubeVIS dataset by submitting the result json file to the CodeLab system, and inference time is calculated by pure model inference time (without data-loading and post-processing).

name backbone FPS mask AP model md5
0 VisTR R50 69.9 34.4 vistr_r50(Please wait)
1 VisTR R101 57.7 36.5 vistr_r101 2b8d412225121fb1694427ab69a40656

License

VisTR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Acknowledgement

We would like to thank the DETR open-source project for its awesome work, part of the code are modified from its project.

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@inproceedings{wang2020end,
  title={End-to-End Video Instance Segmentation with Transformers},
  author={Wang, Yuqing and Xu, Zhaoliang and Wang, Xinlong and Shen, Chunhua and Cheng, Baoshan and Shen, Hao and Xia, Huaxia},
  booktitle =  {Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}
Owner
Yuqing Wang
Computer vision. Instance segmentation.
Yuqing Wang
571 Dec 25, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

31 Dec 07, 2022
masscan + nmap + Finger

说明 个人根据使用习惯修改masnmap而来的一个小工具。调用masscan做全端口扫描,再调用nmap做服务识别,最后调用Finger做Web指纹识别。工具使用场景适合风险探测排查、众测等。 使用方法 安装依赖 pip3 install -r requirements.txt -i https:/

Ryan 3 Mar 25, 2022
这是一个yolox-pytorch的源码,可以用于训练自己的模型。

YOLOX:You Only Look Once目标检测模型在Pytorch当中的实现 目录 性能情况 Performance 实现的内容 Achievement 所需环境 Environment 小技巧的设置 TricksSet 文件下载 Download 训练步骤 How2train 预测步骤

Bubbliiiing 613 Jan 05, 2023
Makes patches from huge resolution .svs slide files using openslide

openslide_patcher Makes patches from huge resolution .svs slide files using openslide Example collage I made from outputs:

2 Dec 23, 2021
A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks

A framework that constructs deep neural networks, autoencoders, logistic regressors, and linear networks without the use of any outside machine learning libraries - all from scratch.

Kordel K. France 2 Nov 14, 2022
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA)

Description This is the repository of shape matching algorithm Iterative Rotations and Assignments (IRA), described in the publication [1]. Directory

MAMMASMIAS Consortium 6 Nov 14, 2022
Re-implementation of the vector capsule with dynamic routing

VectorCapsule Re-implementation of the vector capsule with dynamic routing We implement the vector capsule and dynamic routing via graph neural networ

ZhenchaoTang 10 Feb 10, 2022
Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning"

CAPGNN Source code and dataset of the paper "Contrastive Adaptive Propagation Graph Neural Networks forEfficient Graph Learning" Paper URL: https://ar

1 Mar 12, 2022
SASM - simple crossplatform IDE for NASM, MASM, GAS and FASM assembly languages

SASM (SimpleASM) - простая кроссплатформенная среда разработки для языков ассемблера NASM, MASM, GAS, FASM с подсветкой синтаксиса и отладчиком. В SA

Dmitriy Manushin 5.6k Jan 06, 2023
Deep Inertial Prediction (DIPr)

Deep Inertial Prediction For more information and context related to this repo, please refer to our website. Getting Started (non Docker) Note: you wi

Arcturus Industries 12 Nov 11, 2022
A python module for scientific analysis of 3D objects based on VTK and Numpy

A lightweight and powerful python module for scientific analysis and visualization of 3d objects.

Marco Musy 1.5k Jan 06, 2023
The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch Railway

Openspoor The openspoor package is intended to allow easy transformation between different geographical and topological systems commonly used in Dutch

7 Aug 22, 2022
Discovering Interpretable GAN Controls [NeurIPS 2020]

GANSpace: Discovering Interpretable GAN Controls Figure 1: Sequences of image edits performed using control discovered with our method, applied to thr

Erik Härkönen 1.7k Jan 03, 2023
Code for NeurIPS2021 submission "A Surrogate Objective Framework for Prediction+Programming with Soft Constraints"

This repository is the code for NeurIPS 2021 submission "A Surrogate Objective Framework for Prediction+Programming with Soft Constraints". Edit 2021/

10 Dec 20, 2022
Morphable Detector for Object Detection on Demand

Morphable Detector for Object Detection on Demand (ICCV 2021) PyTorch implementation of the paper Morphable Detector for Object Detection on Demand. I

9 Feb 23, 2022
Code for the published paper : Learning to recognize rare traffic sign

Improving traffic sign recognition by active search This repo contains code for the paper : "Learning to recognise rare traffic signs" How to use this

samsja 4 Jan 05, 2023
1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime

Lihe Yang 209 Jan 01, 2023
Learning Modified Indicator Functions for Surface Reconstruction

Learning Modified Indicator Functions for Surface Reconstruction In this work, we propose a learning-based approach for implicit surface reconstructio

4 Apr 18, 2022