Hooks for VCOCO

Related tags

Deep Learningv-coco
Overview

Verbs in COCO (V-COCO) Dataset

This repository hosts the Verbs in COCO (V-COCO) dataset and associated code to evaluate models for the Visual Semantic Role Labeling (VSRL) task as ddescribed in this technical report.

Citing

If you find this dataset or code base useful in your research, please consider citing the following papers:

@article{gupta2015visual,
  title={Visual Semantic Role Labeling},
  author={Gupta, Saurabh and Malik, Jitendra},
  journal={arXiv preprint arXiv:1505.04474},
  year={2015}
}

@incollection{lin2014microsoft,
  title={Microsoft COCO: Common objects in context},
  author={Lin, Tsung-Yi and Maire, Michael and Belongie, Serge and Hays, James and Perona, Pietro and Ramanan, Deva and Doll{\'a}r, Piotr and Zitnick, C Lawrence},
  booktitle={Computer Vision--ECCV 2014},
  pages={740--755},
  year={2014},
  publisher={Springer}
}

Installation

  1. Clone repository (recursively, so as to include COCO API).

    git clone --recursive https://github.com/s-gupta/v-coco.git
  2. This dataset builds off MS COCO, please download MS-COCO images and annotations.

  3. Current V-COCO release only uses a subset of MS-COCO images (Image IDs listed in data/splits/vcoco_all.ids). Use the following script to pick out annotations from the COCO annotations to allow faster loading in V-COCO.

    # Assume you cloned the repository to `VCOCO_DIR'
    cd $VCOCO_DIR
    # If you downloaded coco annotations to coco-data/annotations
    python script_pick_annotations.py coco-data/annotations
  4. Build coco/PythonAPI/pycocotools/_mask.so, cython_bbox.so.

    # Assume you cloned the repository to `VCOCO_DIR'
    cd $VCOCO_DIR/coco/PythonAPI/ && make
    cd $VCOCO_DIR && make

Using the dataset

  1. An IPython notebook, illustrating how to use the annotations in the dataset is available in V-COCO.ipynb
  2. The current release of the dataset includes annotations as indicated in Table 1 in the paper. We are collecting role annotations for the 6 categories (that are missing) and will make them public shortly.

Evaluation

We provide evaluation code that computes agent AP and role AP, as explained in the paper.

In order to use the evaluation code, store your predictions as a pickle file (.pkl) in the following format:

[ {'image_id':        # the coco image id,
   'person_box':      #[x1, y1, x2, y2] the box prediction for the person,
   '[action]_agent':  # the score for action corresponding to the person prediction,
   '[action]_[role]': # [x1, y1, x2, y2, s], the predicted box for role and 
                      # associated score for the action-role pair.
   } ]

Assuming your detections are stored in det_file=/path/to/detections/detections.pkl, do

from vsrl_eval import VCOCOeval
vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
  # e.g. vsrl_annot_file: data/vcoco/vcoco_val.json
  #      coco_file:       data/instances_vcoco_all_2014.json
  #      split_file:      data/splits/vcoco_val.ids
vcocoeval._do_eval(det_file, ovr_thresh=0.5)

We introduce two scenarios for role AP evaluation.

  1. [Scenario 1] In this scenario, for the test cases with missing role annotations an agent role prediction is correct if the action is correct & the overlap between the person boxes is >0.5 & the corresponding role is empty e.g. [0,0,0,0] or [NaN,NaN,NaN,NaN]. This scenario is fit for missing roles due to occlusion.

  2. [Scenario 2] In this scenario, for the test cases with missing role annotations an agent role prediction is correct if the action is correct & the overlap between the person boxes is >0.5 (the corresponding role is ignored). This scenario is fit for the cases with roles outside the COCO categories.

Owner
Saurabh Gupta
Saurabh Gupta
Perception-aware multi-sensor fusion for 3D LiDAR semantic segmentation (ICCV 2021)

Perception-Aware Multi-Sensor Fusion for 3D LiDAR Semantic Segmentation (ICCV 2021) [中文|EN] 概述 本工作主要探索一种高效的多传感器(激光雷达和摄像头)融合点云语义分割方法。现有的多传感器融合方法主要将点云投影

ICE 126 Dec 30, 2022
A simple, fully convolutional model for real-time instance segmentation.

You Only Look At CoefficienTs ██╗ ██╗ ██████╗ ██╗ █████╗ ██████╗████████╗ ╚██╗ ██╔╝██╔═══██╗██║ ██╔══██╗██╔════╝╚══██╔══╝ ╚██

Daniel Bolya 4.6k Dec 30, 2022
Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images

Lung Segmentation (2D) Repository features UNet inspired architecture used for segmenting lungs on chest X-Ray images. Demo See the application of the

163 Sep 21, 2022
Vision Transformer for 3D medical image registration (Pytorch).

ViT-V-Net: Vision Transformer for Volumetric Medical Image Registration keywords: vision transformer, convolutional neural networks, image registratio

Junyu Chen 192 Dec 20, 2022
This is an official implementation for "Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation".

Exploiting Temporal Contexts with Strided Transformer for 3D Human Pose Estimation This repo is the official implementation of Exploiting Temporal Con

Vegetabird 241 Jan 07, 2023
OMNIVORE is a single vision model for many different visual modalities

Omnivore: A Single Model for Many Visual Modalities [paper][website] OMNIVORE is a single vision model for many different visual modalities. It learns

Meta Research 451 Dec 27, 2022
GPU-Accelerated Deep Learning Library in Python

Hebel GPU-Accelerated Deep Learning Library in Python Hebel is a library for deep learning with neural networks in Python using GPU acceleration with

Hannes Bretschneider 1.2k Dec 21, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

A clean and extensible PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners A PyTorch re-implementation of Mask Autoencoder trai

Tianyu Hua 23 Dec 13, 2022
The `rtdl` library + The official implementation of the paper

The `rtdl` library + The official implementation of the paper "Revisiting Deep Learning Models for Tabular Data"

Yandex Research 510 Dec 30, 2022
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models

PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models Code accompanying CVPR'20 paper of the same title. Paper lin

Alex Damian 7k Dec 30, 2022
My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

machine-learning-with-graphs My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs Course materials can be

Marko Njegomir 7 Dec 14, 2022
DilatedNet in Keras for image segmentation

Keras implementation of DilatedNet for semantic segmentation A native Keras implementation of semantic segmentation according to Multi-Scale Context A

303 Mar 15, 2022
Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch

Memory Efficient Attention This is unofficial implementation of Self-attention Does Not Need O(n^2) Memory for Jax and PyTorch. Implementation is almo

Amin Rezaei 126 Dec 27, 2022
Self-Supervised Learning with Kernel Dependence Maximization

Self-Supervised Learning with Kernel Dependence Maximization This is the code for SSL-HSIC, a self-supervised learning loss proposed in the paper Self

DeepMind 29 Dec 29, 2022
Implementation of ReSeg using PyTorch

Implementation of ReSeg using PyTorch ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation Pascal-Part Annotations Pascal VOC 2010

Onur Kaplan 46 Nov 23, 2022
EMNLP 2020 - Summarizing Text on Any Aspects

Summarizing Text on Any Aspects This repo contains preliminary code of the following paper: Summarizing Text on Any Aspects: A Knowledge-Informed Weak

Bowen Tan 35 Nov 14, 2022
Adversarial vulnerability of powerful near out-of-distribution detection

Adversarial vulnerability of powerful near out-of-distribution detection by Stanislav Fort In this repository we're collecting replications for the ke

Stanislav Fort 9 Aug 30, 2022
Physical Anomalous Trajectory or Motion (PHANTOM) Dataset

Physical Anomalous Trajectory or Motion (PHANTOM) Dataset Description This dataset contains the six different classes as described in our paper[]. The

0 Dec 16, 2021
Code for "NeRS: Neural Reflectance Surfaces for Sparse-View 3D Reconstruction in the Wild," in NeurIPS 2021

Code for Neural Reflectance Surfaces (NeRS) [arXiv] [Project Page] [Colab Demo] [Bibtex] This repo contains the code for NeRS: Neural Reflectance Surf

Jason Y. Zhang 234 Dec 30, 2022