[CVPR 2021] MiVOS - Scribble to Mask module

Overview

MiVOS (CVPR 2021) - Scribble To Mask

Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang

[arXiv] [Paper PDF] [Project Page]

A simplistic network that turns scribbles to mask. It supports multi-object segmentation using soft-aggregation. Don't expect SOTA results from this model!

Ex1 Ex2

Overall structure and capabilities

MiVOS Mask-Propagation Scribble-to-Mask
DAVIS/YouTube semi-supervised evaluation ✔️
DAVIS interactive evaluation ✔️
User interaction GUI tool ✔️
Dense Correspondences ✔️
Train propagation module ✔️
Train S2M (interaction) module ✔️
Train fusion module ✔️
Generate more synthetic data ✔️

Requirements

The package versions shown here are the ones that I used. You might not need the exact versions.

Refer to the official PyTorch guide for installing PyTorch/torchvision. The rest can be installed by:

pip install opencv-contrib-python gitpython gdown

Pretrained model

Download and put the model in ./saves/. Alternatively use the provided download_model.py.

[OneDrive Mirror]

Interactive GUI

python interactive.py --image <image>

Controls:

Mouse Left - Draw scribbles
Mouse middle key - Switch positive/negative
Key f - Commit changes, clear scribbles
Key r - Clear everything
Key d - Switch between overlay/mask view
Key s - Save masks into a temporary output folder (./output/)

Known issues

The model almost always needs to focus on at least one object. It is very difficult to erase all existing masks from an image using scribbles.

Training

Datasets

  1. Download and extract LVIS training set.
  2. Download and extract a set of static image segmentation datasets. These are already downloaded for you if you used the download_datasets.py in Mask-Propagation.
├── lvis
│   ├── lvis_v1_train.json
│   └── train2017
├── Scribble-to-Mask
└── static
    ├── BIG_small
    └── ...

Commands

Use the deeplabv3plus_resnet50 pretrained model provided here.

CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 9842 --nproc_per_node=2 train.py --id s2m --load_deeplab <path_to_deeplab.pth>

Credit

Deeplab implementation and pretrained model: https://github.com/VainF/DeepLabV3Plus-Pytorch.

Citation

Please cite our paper if you find this repo useful!

@inproceedings{MiVOS_2021,
  title={Modular Interactive Video Object Segmentation: Interaction-to-Mask, Propagation and Difference-Aware Fusion},
  author={Cheng, Ho Kei and Tai, Yu-Wing and Tang, Chi-Keung},
  booktitle={CVPR},
  year={2021}
}

Contact: [email protected]

Comments
  • AttributeError: Caught AttributeError in DataLoader worker process 0

    AttributeError: Caught AttributeError in DataLoader worker process 0

    Hello! I followed the instructions of the training command, it has thrown an error about AttributeError. dataloader_error I put the static folder outside this repository as you mentioned. It is confusing that I can use the same datasets for the pretraining propagation module, the train.py in Mask-Propagation works fine.

    opened by xwhkkk 2
  • git.exc.InvalidGitRepositoryError when running train.py

    git.exc.InvalidGitRepositoryError when running train.py

    Hello! I followed the instruction of the training command, but it has thrown an error about GitRepositoryError. gitError I used command : CUDA_VISIBLE_DEVICES=0,1 OMP_NUM_THREADS=4 python -m torch.distributed.launch --master_port 1842 --nproc_per_node=2 train.py --id s2m --load_deeplab ./deeplab_resnet50/best_deeplabv3plus_resnet50_voc_os16.pth, and I have 2 GPUs. Could you give me some suggestions?

    opened by xwhkkk 2
  • About evaluation of the model

    About evaluation of the model

    Hi,

    thank you for the nice work.

    I have a concern about the evaluation of the model. Because there is no validation set to pick the best model. It may has a potential overfitting problem. (Or what should the validation set for interactive segmentation look like? If there is a unified standard, it will be more helpful for everyone to compare their methods.)

    In interactive object segmentation setting, is this setting popular? I am new here for the interactive segmentation. Wish to solve my concern, thank you.

    opened by Limingxing00 2
  • Question about Local Control Strategy

    Question about Local Control Strategy

    A simple but practical segmentation tool! I've read your paper, and it says that local control strategy is used in S2M. However, I don't find the local control step in this code. Why don't you provide it in this tool? Will local control make significant difference to the performance?

    opened by distillation-dcf 1
  • DeepLabv3 pre-trained models

    DeepLabv3 pre-trained models

    Hello,

    I wanted to mention that in order to train S2M from scratch, using the deeplabv3_resnet50 pre-trained model provided in this repo, returns the following error: KeyError: 'classifier.classifier.0.convs.0.0.weight. Meaning that the weights from this layer are not present in deeplabv3_resnet50. But using the deeplabv3plus_resnet50 from the same repo executes without errors.

    Best!

    opened by UndecidedBoy 1
  • saving error

    saving error

    Hello! Thanks for sharing your code. When I run python interactive.py and want to save the masks, appeared following error.

    image

    Could you give me some suggestions?

    opened by xwhkkk 3
  • Fix simple issues and allow for cpu only use

    Fix simple issues and allow for cpu only use

    I had to make some changes to be able to use the code on cpu only system and had troubles saving the mask from the interactive GUI and fixed it. Thanks for the great work.

    opened by rami-alloush 3
Releases(1.0)
Temporally Coherent GAN SIGGRAPH project.

TecoGAN This repository contains source code and materials for the TecoGAN project, i.e. code for a TEmporally COherent GAN for video super-resolution

Duc Linh Nguyen 2 Jan 18, 2022
JAX + dataclasses

jax_dataclasses jax_dataclasses provides a wrapper around dataclasses.dataclass for use in JAX, which enables automatic support for: Pytree registrati

Brent Yi 35 Dec 21, 2022
GraphGT: Machine Learning Datasets for Graph Generation and Transformation

GraphGT: Machine Learning Datasets for Graph Generation and Transformation Dataset Website | Paper Installation Using pip To install the core environm

y6q9 50 Aug 18, 2022
Magisk module to enable hidden features on Android 12 Developer Preview 1.

Android 12 Extensions This is a Magisk module that enables hidden features on Android 12 Developer Preview 1. Features Scrolling screenshots Wallpaper

Danny Lin 384 Jan 06, 2023
This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool

OpenSurfaces Segmentation UI This repository contains the segmentation user interface from the OpenSurfaces project, extracted as a lightweight tool.

Sean Bell 66 Jul 11, 2022
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

253 Dec 24, 2022
A library for performing coverage guided fuzzing of neural networks

TensorFuzz: Coverage Guided Fuzzing for Neural Networks This repository contains a library for performing coverage guided fuzzing of neural networks,

Brain Research 195 Dec 28, 2022
DeiT: Data-efficient Image Transformers

DeiT: Data-efficient Image Transformers This repository contains PyTorch evaluation code, training code and pretrained models for DeiT (Data-Efficient

Facebook Research 3.2k Jan 06, 2023
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentiment analysis dataset

UIT-ViSD4SA PACLIC 35 General Introduction This repository contains the data of the paper: Span Detection for Vietnamese Aspect-Based Sentiment Analys

Nguyễn Thị Thanh Kim 5 Nov 13, 2022
EmoTag helps you train emotion detection model for Chinese audios

emoTag emoTag helps you train emotion detection model for Chinese audios. Environment pip install -r requirement.txt Data We used Emotional Speech Dat

_zza 4 Sep 07, 2022
[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Convolutional MLP ConvMLP: Hierarchical Convolutional MLPs for Vision Preprint link: ConvMLP: Hierarchical Convolutional MLPs for Vision By Jiachen Li

SHI Lab 143 Jan 03, 2023
Open-source code for Generic Grouping Network (GGN, CVPR 2022)

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity Pytorch implementation for "Open-World Instance Segmen

Meta Research 99 Dec 06, 2022
A general framework for deep learning experiments under PyTorch based on pytorch-lightning

torchx Torchx is a general framework for deep learning experiments under PyTorch based on pytorch-lightning. TODO list gan-like training wrapper text

Yingtian Liu 6 Mar 17, 2022
QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper)

QAHOI QAHOI: Query-Based Anchors for Human-Object Interaction Detection (paper) Requirements PyTorch = 1.5.1 torchvision = 0.6.1 pip install -r requ

38 Dec 29, 2022
PyTorch implementation of HDN(Homography Decomposition Networks) for planar object tracking

Homography Decomposition Networks for Planar Object Tracking This project is the offical PyTorch implementation of HDN(Homography Decomposition Networ

CaptainHook 48 Dec 15, 2022
Graduation Project

Gesture-Detection-and-Depth-Estimation This is my graduation project. (1) In this project, I use the YOLOv3 object detection model to detect gesture i

ChaosAT 1 Nov 23, 2021
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models

Deepvoice3_pytorch PyTorch implementation of convolutional networks-based text-to-speech synthesis models: arXiv:1710.07654: Deep Voice 3: Scaling Tex

Ryuichi Yamamoto 1.8k Jan 08, 2023
List of papers, code and experiments using deep learning for time series forecasting

Deep Learning Time Series Forecasting List of state of the art papers focus on deep learning and resources, code and experiments using deep learning f

Alexander Robles 2k Jan 06, 2023
Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.

Welcome to spline - the pipeline tool Important note: Since change in my job I didn't had the chance to continue on this project. My main new project

Thomas Lehmann 29 Aug 22, 2022
RuDOLPH: One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP

[Paper] [Хабр] [Model Card] [Colab] [Kaggle] RuDOLPH 🦌 🎄 ☃️ One Hyper-Modal Transformer can be creative as DALL-E and smart as CLIP Russian Diffusio

AI Forever 232 Jan 04, 2023