code for CVPR paper Zero-shot Instance Segmentation

Last update: Dec 13, 2022

Overview

Code for CVPR2021 paper

Zero-shot Instance Segmentation

Code requirements

python: python3.7
nvidia GPU
pytorch1.1.0
GCC >=5.4
NCCL 2
the other python libs in requirement.txt

Install

conda create -n zsi python=3.7 -y
conda activate zsi

conda install pytorch=1.1.0 torchvision=0.3.0 cudatoolkit=10.0 -c pytorch

pip install cython && pip --no-cache-dir install -r requirements.txt
   
python setup.py develop

Dataset prepare

Download the train and test annotations files for zsi from annotations, put all json label file to
```
data/coco/annotations/
```
Download MSCOCO-2014 dataset and unzip the images it to path：
```
data/coco/train2014/
data/coco/val2014/
```

Training:

48/17 split:

   chmod +x tools/dist_train.sh
   ./tools/dist_train.sh configs/zsi/train/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_decoder.py 4

65/15 split:

chmod +x tools/dist_train.sh
./tools/dist_train.sh configs/zsi/train/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_65_15_decoder_notanh.py 4

Inference & Evaluate:

ZSI task:

48/17 split ZSI task:

download 48/17 ZSI model, put it in checkpoints/ZSI_48_17.pth

inference:

chmod +x tools/dist_test.sh
./tools/dist_test.sh configs/zsi/48_17/test/zsi/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_decoder.py checkpoints/ZSI_48_17.pth 4 --json_out results/zsi_48_17.json

our results zsi_48_17.bbox.json and zsi_48_17.segm.json can also downloaded from zsi_48_17_reults.

evaluate:

for zsd performance

python tools/zsi_coco_eval.py results/zsi_48_17.bbox.json --ann data/coco/annotations/instances_val2014_unseen_48_17.json

for zsi performance

python tools/zsi_coco_eval.py results/zsi_48_17.segm.json --ann data/coco/annotations/instances_val2014_unseen_48_17.json --types segm

65/15 split ZSI task:

download 65/15 ZSI model, put it in checkpoints/ZSI_65_15.pth

inference:

chmod +x tools/dist_test.sh
./toools/dist_test.sh configs/zsi/65_15/test/zsi/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_65_15_decoder_notanh.py checkpoints/ZSI_65_15.pth 4 --json_out results/zsi_65_15.json

our results zsi_65_15.bbox.json and zsi_65_15.segm.json can also downloaded from zsi_65_15_reults.

evaluate:

for zsd performance

python tools/zsi_coco_eval.py results/zsi_65_15.bbox.json --ann data/coco/annotations/instances_val2014_unseen_65_15.json

for zsi performance

python tools/zsi_coco_eval.py results/zsi_65_15.segm.json --ann data/coco/annotations/instances_val2014_unseen_65_15.json --types segm

GZSI task:

48/17 split GZSI task:

use the same model file ZSI_48_17.pth in ZSI task

inference:

chmod +x tools/dist_test.sh
./tools/dist_test.sh configs/zsi/48_17/test/gzsi/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_decoder_gzsi.py checkpoints/ZSI_48_17.pth 4 --json_out results/gzsi_48_17.json

our results gzsi_48_17.bbox.json and gzsi_48_17.segm.json can also downloaded from gzsi_48_17_results.

evaluate:

for gzsd

python tools/gzsi_coco_eval.py results/gzsi_48_17.bbox.json --ann data/coco/annotations/instances_val2014_gzsi_48_17.json --gzsi --num-seen-classes 48

for gzsi

python tools/gzsi_coco_eval.py results/gzsi_48_17.segm.json --ann data/coco/annotations/instances_val2014_gzsi_48_17.json --gzsi --num-seen-classes 48 --types segm

65/15 split GZSI task:

use the same model file ZSI_48_17.pth in ZSI task

inference:

chmod +x tools/dist_test.sh
./tools/dist_test.sh configs/zsi/65_15/test/gzsi/zero-shot-mask-rcnn-BARPN-bbox_mask_sync_bg_65_15_decoder_notanh_gzsi.py checkpoints/ZSI_65_15.pth 4 --json_out results/gzsi_65_15.json

our results gzsi_65_15.bbox.json and gzsi_65_15.segm.json can also downloaded from gzsi_65_15_results.

evaluate:

for gzsd

python tools/gzsi_coco_eval.py results/gzsi_65_15.bbox.json --ann data/coco/annotations/instances_val2014_gzsi_65_15.json --gzsd --num-seen-classes 65

for gzsi

python tools/gzsi_coco_eval.py results/gzsi_65_15.segm.json --ann data/coco/annotations/instances_val2014_gzsi_65_15.json --gzsd --num-seen-classes 65 --types segm

License

ZSI is released under MIT License.

Citing

If you use ZSI in your research or wish to refer to the baseline results published here, please use the following BibTeX entries:

@InProceedings{zhengye2021zsi,
  author  =  {Ye, Zheng and Jiahong, Wu and Yongqiag, Qin and Faen, Zhang and Li, Cui},
  title   =  {Zero-shot Instance Segmentation},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2021}
}

code for CVPR paper Zero-shot Instance Segmentation

Related tags

Overview

Code for CVPR2021 paper

Zero-shot Instance Segmentation

Code requirements

Install

Dataset prepare

License

Citing

Owner

zhengye

Code for Learning to Segment The Tail (LST)

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Compare outputs between layers written in Tensorflow and layers written in Pytorch

The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

ARAE-Tensorflow for Discrete Sequences (Adversarially Regularized Autoencoder)

N-Omniglot is a large neuromorphic few-shot learning dataset

[CVPR 2021] Teachers Do More Than Teach: Compressing Image-to-Image Models (CAT)

Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi-Supervised Learning for Fine-Grained Classification

ML-Decoder: Scalable and Versatile Classification Head

Repository for the paper : Meta-FDMixup: Cross-Domain Few-Shot Learning Guided byLabeled Target Data

PyTorch implementation of the R2Plus1D convolution based ResNet architecture described in the paper "A Closer Look at Spatiotemporal Convolutions for Action Recognition"

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

Code for the paper Open Sesame: Getting Inside BERT's Linguistic Knowledge.

This is a vision-based 3d model manipulation and control UI

🐥A PyTorch implementation of OpenAI's finetuned transformer language model with a script to import the weights pre-trained by OpenAI

Vrcwatch - Supply the local time to VRChat as Avatar Parameters through OSC

For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

Spatial Intention Maps for Multi-Agent Mobile Manipulation (ICRA 2021)

A pre-trained model with multi-exit transformer architecture.