Hooks for VCOCO

Last update: Nov 24, 2022

Related tags

Overview

Verbs in COCO (V-COCO) Dataset

This repository hosts the Verbs in COCO (V-COCO) dataset and associated code to evaluate models for the Visual Semantic Role Labeling (VSRL) task as ddescribed in this technical report.

Citing

If you find this dataset or code base useful in your research, please consider citing the following papers:

@article{gupta2015visual,
  title={Visual Semantic Role Labeling},
  author={Gupta, Saurabh and Malik, Jitendra},
  journal={arXiv preprint arXiv:1505.04474},
  year={2015}
}

@incollection{lin2014microsoft,
  title={Microsoft COCO: Common objects in context},
  author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
  booktitle={Computer Vision--ECCV 2014},
  pages={740--755},
  year={2014},
  publisher={Springer}
}

Installation

Clone repository (recursively, so as to include COCO API).

git clone --recursive https://github.com/s-gupta/v-coco.git

This dataset builds off MS COCO, please download MS-COCO images and annotations.
Current V-COCO release only uses a subset of MS-COCO images (Image IDs listed in data/splits/vcoco_all.ids). Use the following script to pick out annotations from the COCO annotations to allow faster loading in V-COCO.
```
# Assume you cloned the repository to `VCOCO_DIR'
cd $VCOCO_DIR
# If you downloaded coco annotations to coco-data/annotations
python script_pick_annotations.py coco-data/annotations
```

Build coco/PythonAPI/pycocotools/_mask.so, cython_bbox.so.

# Assume you cloned the repository to `VCOCO_DIR'
cd $VCOCO_DIR/coco/PythonAPI/ && make
cd $VCOCO_DIR && make

Using the dataset

An IPython notebook, illustrating how to use the annotations in the dataset is available in V-COCO.ipynb
The current release of the dataset includes annotations as indicated in Table 1 in the paper. We are collecting role annotations for the 6 categories (that are missing) and will make them public shortly.

Evaluation

We provide evaluation code that computes agent AP and role AP, as explained in the paper.

In order to use the evaluation code, store your predictions as a pickle file (.pkl) in the following format:

[ {'image_id':        # the coco image id,
   'person_box':      #[x1, y1, x2, y2] the box prediction for the person,
   '[action]_agent':  # the score for action corresponding to the person prediction,
   '[action]_[role]': # [x1, y1, x2, y2, s], the predicted box for role and 
                      # associated score for the action-role pair.
   } ]

Assuming your detections are stored in det_file=/path/to/detections/detections.pkl, do

from vsrl_eval import VCOCOeval
vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
  # e.g. vsrl_annot_file: data/vcoco/vcoco_val.json
  #      coco_file:       data/instances_vcoco_all_2014.json
  #      split_file:      data/splits/vcoco_val.ids
vcocoeval._do_eval(det_file, ovr_thresh=0.5)

We introduce two scenarios for role AP evaluation.

[Scenario 1] In this scenario, for the test cases with missing role annotations an agent role prediction is correct if the action is correct & the overlap between the person boxes is >0.5 & the corresponding role is empty e.g. [0,0,0,0] or [NaN,NaN,NaN,NaN]. This scenario is fit for missing roles due to occlusion.
[Scenario 2] In this scenario, for the test cases with missing role annotations an agent role prediction is correct if the action is correct & the overlap between the person boxes is >0.5 (the corresponding role is ignored). This scenario is fit for the cases with roles outside the COCO categories.

Hooks for VCOCO

Related tags

Overview

Verbs in COCO (V-COCO) Dataset

Citing

Installation

Using the dataset

Evaluation

Owner

Saurabh Gupta

Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

A simple, fully convolutional model for real-time instance segmentation.

Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images

Vision Transformer for 3D medical image registration (Pytorch).

This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

OMNIVORE is a single vision model for many different visual modalities

GPU-Accelerated Deep Learning Library in Python

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

The `rtdl` library + The official implementation of the paper

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

DilatedNet in Keras for image segmentation

Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Self-Supervised Learning with Kernel Dependence Maximization

Implementation of ReSeg using PyTorch

EMNLP 2020 - Summarizing Text on Any Aspects

Adversarial vulnerability of powerful near out-of-distribution detection

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Code for "NeRS: Neural Reflectance Surfaces for Sparse-View 3D Reconstruction in the Wild," in NeurIPS 2021