Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Related tags

Deep Learningpcme
Overview

Probabilistic Cross-Modal Embedding (PCME) CVPR 2021

Official Pytorch implementation of PCME | Paper

Sanghyuk Chun1 Seong Joon Oh1 Rafael Sampaio de Rezende2 Yannis Kalantidis2 Diane Larlus2

1NAVER AI LAB
2NAVER LABS Europe

VIDEO

Updates

  • 23 Jun, 2021: Initial upload.

Installation

Install dependencies using the following command.

pip install cython && pip install -r requirements.txt
python -c 'import nltk; nltk.download("punkt", download_dir="/opt/conda/nltk_data")'
git clone https://github.com/NVIDIA/apex && cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dockerfile

You can use my docker image as well

docker pull sanghyukchun/pcme:torch1.2-apex-dali

Please Add --model__cache_dir /vector_cache when you run the code

Configuration

All experiments are based on configuration files (see config/coco and config/cub). If you want to change only a few options, instead of re-writing a new configuration file, you can override the configuration as the follows:

python .py --dataloader__batch_size 32 --dataloader__eval_batch_size 8 --model__eval_method matching_prob

See config/parser.py for details

Dataset preparation

COCO Caption

We followed the same split provided by VSE++. Dataset splits can be found in datasets/annotations.

Note that we also need instances_2014.json for computing PMRP score.

CUB Caption

Download images from this link, and download caption from reedscot/cvpr2016. You can use the image path and the caption path separately in the code.

Evaluate pretrained models

NOTE: the current implementation of plausible match R-Precision (PMRP) is not efficient:
It first dumps all ranked items for each item to a local file, and compute R-precision.
We are planning to re-implement efficient PMRP as soon as possible.

COCO Caption

# Compute recall metrics
python evaluate_recall_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image
# Compute plausible match R-Precision (PMRP) metric
python extract_rankings_coco.py ./config/coco/pcme_coco.yaml \
    --dataset_root  \
    --model_path model_last.pth \
    --dump_to  \
    # --model__cache_dir /vector_cache # if you use my docker image

python evaluate_pmrp_coco.py --ranking_file 
Method I2T PMRP I2T [email protected] T2I PMRP T2I [email protected] Model file
PCME 45.0 68.8 46.0 54.6 link
PVSE K=1 40.3 66.7 41.8 53.5 -
PVSE K=2 42.8 69.2 43.6 55.2 -
VSRN 41.2 76.2 42.4 62.8 -
VSRN + AOQ 44.7 77.5 45.6 63.5 -

CUB Caption

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    # --model__cache_dir /vector_cache # if you use my docker image

NOTE: If you just download file from reedscot/cvpr2016, then caption_root will be cvpr2016_cub/text_c10

If you want to test other probabilistic distances, such as Wasserstein distance or KL-divergence, try the following command:

python evaluate_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --model_path model_last.pth \
    --model__eval_method  \
    # --model__cache_dir /vector_cache # if you use my docker image

You can choose distance_method in ['elk', 'l2', 'min', 'max', 'wasserstein', 'kl', 'reverse_kl', 'js', 'bhattacharyya', 'matmul', 'matching_prob']

How to train

NOTE: we train each model with mixed-precision training (O2) on a single V100.
Since, the current code does not support multi-gpu training, if you use different hardware, the batchsize should be reduced.
Please note that, hence, the results couldn't be reproduced if you use smaller hardware than V100.

COCO Caption

python train_coco.py ./config/coco/pcme_coco.yaml --dataset_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 46 hours in a single V100 with mixed precision training.

CUB Caption

We use CUB Caption dataset (Reed, et al. 2016) as a new cross-modal retrieval benchmark. Here, instead of matching the sparse paired image-caption pairs, we treat all image-caption pairs in the same class as positive. Since our split is based on the zero-shot learning benchmark (Xian, et al. 2017), we leave out 50 classes from 200 bird classes for the evaluation.

  • Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Xian, Yongqin, Bernt Schiele, and Zeynep Akata. "Zero-shot learning-the good, the bad and the ugly." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

hyperparameter search

We additionally use cross-validation splits by (Xian, et el. 2017), namely using 100 classes for training and 50 classes for validation.

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    --dataset_name cub_trainval1 \
    # --model__cache_dir /vector_cache # if you use my docker image

Similarly, you can use cub_trainval2 and cub_trainval3 as well.

training with full training classes

python train_cub.py ./config/cub/pcme_cub.yaml \
    --dataset_root  \
    --caption_root  \
    # --model__cache_dir /vector_cache # if you use my docker image

It takes about 4 hours in a single V100 with mixed precision training.

How to cite

@inproceedings{chun2021pcme,
    title={Probabilistic Embeddings for Cross-Modal Retrieval},
    author={Chun, Sanghyuk and Oh, Seong Joon and De Rezende, Rafael Sampaio and Kalantidis, Yannis and Larlus, Diane},
    year={2021},
    booktitle={Conference on Computer Vision and Pattern Recognition (CVPR)},
}

License

MIT License

Copyright (c) 2021-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
Owner
NAVER AI
Official account of NAVER AI, Korea No.1 Industrial AI Research Group
NAVER AI
SAT: 2D Semantics Assisted Training for 3D Visual Grounding, ICCV 2021 (Oral)

SAT: 2D Semantics Assisted Training for 3D Visual Grounding SAT: 2D Semantics Assisted Training for 3D Visual Grounding by Zhengyuan Yang, Songyang Zh

Zhengyuan Yang 22 Nov 30, 2022
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

thenewboston 27 Jul 08, 2022
NER for Indian languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages Code for the paper - https://arxiv.org/abs/2111.11815 Setup Setup a virtual environment Th

Akshara P 0 Nov 24, 2021
This repo includes the supplementary of our paper "CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels"

Supplementary Materials for CEMENT: Incomplete Multi-View Weak-Label Learning with Long-Tailed Labels This repository includes all supplementary mater

Zhiwei Li 0 Jan 05, 2022
This is an implementation of PIFuhd based on Pytorch

Open-PIFuhd This is a unofficial implementation of PIFuhd PIFuHD: Multi-Level Pixel-Aligned Implicit Function forHigh-Resolution 3D Human Digitization

Lingteng Qiu 235 Dec 19, 2022
Implementation of Basic Machine Learning Algorithms on small datasets using Scikit Learn.

Basic Machine Learning Algorithms All the basic Machine Learning Algorithms are implemented in Python using libraries Acknowledgements Machine Learnin

Piyal Banik 47 Oct 16, 2022
Space-event-trace - Tracing service for spaceteam events

space-event-trace Tracing service for TU Wien Spaceteam events. This service is

TU Wien Space Team 2 Jan 04, 2022
Probabilistic Gradient Boosting Machines

PGBM Probabilistic Gradient Boosting Machines (PGBM) is a probabilistic gradient boosting framework in Python based on PyTorch/Numba, developed by Air

Olivier Sprangers 112 Dec 28, 2022
YuNetのPythonでのONNX、TensorFlow-Lite推論サンプル

YuNet-ONNX-TFLite-Sample YuNetのPythonでのONNX、TensorFlow-Lite推論サンプルです。 TensorFlow-LiteモデルはPINTO0309/PINTO_model_zoo/144_YuNetのものを使用しています。 Requirement Op

KazuhitoTakahashi 8 Nov 17, 2021
Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

Hierarchical Memory Matching Network for Video Object Segmentation Hongje Seong, Seoung Wug Oh, Joon-Young Lee, Seongwon Lee, Suhyeon Lee, Euntai Kim

Hongje Seong 72 Dec 14, 2022
Code for EMNLP2021 paper "Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training"

VoCapXLM Code for EMNLP2021 paper Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training Environment DockerFile: dancingso

Bo Zheng 15 Jul 28, 2022
This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Vietnamese sign lagnuage recognition using MHI and CNN This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm

Phat Pham 3 Feb 24, 2022
[ICCV 2021 Oral] Mining Latent Classes for Few-shot Segmentation

Mining Latent Classes for Few-shot Segmentation Lihe Yang, Wei Zhuo, Lei Qi, Yinghuan Shi, Yang Gao. This codebase contains baseline of our paper Mini

Lihe Yang 66 Nov 29, 2022
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

Time-Sensitive-QA The repo contains the dataset and code for NeurIPS2021 (dataset track) paper Time-Sensitive Question Answering dataset. The dataset

wenhu chen 35 Nov 14, 2022
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 09, 2022
Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Human Pose Estimation (accepted to CVPR2019) News If you are interested in internship or research pos

HRNet 167 Dec 27, 2022
Exadel CompreFace is a free and open-source face recognition GitHub project

Exadel CompreFace is a leading free and open-source face recognition system Exadel CompreFace is a free and open-source face recognition service that

Exadel 2.6k Jan 04, 2023
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022