Code and data for "Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning" (EMNLP 2021).

Related tags

Deep LearningGD-VCR
Overview

GD-VCR

Code for Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning (EMNLP 2021).

Research Questions and Aims:

  1. How well can a model perform on the images which requires geo-diverse commonsense to understand?
  2. What are the reasons behind performance disparity on Western and non-Western images?
  3. We aim to broaden researchers' vision on a realistic issue existing all over the world, and call upon researchers to consider more inclusive commonsense knowledge and better model transferability on various cultures.

In this repo, GD-VCR dataset and codes about 1) general model evaluation, 2) detailed controlled experiments, and 3) dataset construction are provided.

Repo Structure

GD-VCR
 ├─X_VCR				  --> storing GD-VCR/VCR data
 ├─configs
 │  └─vcr
 │     └─fine-tune-qa.json		  --> part of configs for evaluation
 ├─dataloaders
 │  └─vcr.py			          --> load GD-VCR/VCR data based on configs
 ├─models
 │  └─train.py		                  --> fine-tune/evaluate models
 │
 ├─val.jsonl			          --> GD-VCR dataset
 ├─val_addition_single.jsonl		  --> additional low-order QA pairs

GD-VCR dataset

First download the original VCR dataset to X_VCR:

cd X_VCR
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1annots.zip
wget https://s3.us-west-2.amazonaws.com/ai2-rowanz/vcr1images.zip
unzip vcr1annots.zip
unzip vcr1images.zip

Then download the GD-VCR dataset to X_VCR:

cd X_VCR
mv val.jsonl orig_val.jsonl
wget https://gdvcr.s3.us-west-1.amazonaws.com/MC-VCR_sample.zip
unzip MC-VCR_sample.zip

cd ..
mv val.jsonl X_VCR/
mv val_addition_single.jsonl X_VCR/

The detailed items in our GD-VCR dataset are almost the same as VCR. Please refer to VCR website for detailed explanations.

VisualBERT

Prepare Environment

Prepare environment as mentioned in the original repo of VisualBERT.

Fine-tune model on original VCR

Download the task-specific pre-trained checkpoint on original VCR vcr_pre_train.th to GD-VCR/visualbert/trained_models.

Then, use the command to fine-tune:

export PYTHONPATH=$PYTHONPATH:GD-VCR/visualbert/
export PYTHONPATH=$PYTHONPATH:GD-VCR/

cd GD-VCR/visualbert/models

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/fine-tune-qa.json

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0 python train.py -folder ../trained_models -config ../configs/vcr/eval.json \
        [-region REGION] \
        [-scene SCENE] \
        [-single_or_multiple SINGLE_OR_MULTIPLE] \
        [-orig_or_new ORIG_OR_NEW] \
	[-addition_annotation_analysis] \
        [-grounding]

Here are the explanations of several important attributions:

  • REGION: One of the regions west, east-asia, south-asia, africa.
  • SCENE: One of the scenario (e.g., wedding).
  • SINGLE_OR_MULTIPLE: Whether studying single(low-order) or multiple(high-order) cognitive questions.
  • addition_annotation_analysis: Whether studying GD-VCR or additional annotated questions. If yes, you can choose to set SINGLE_OR_MULTIPLE to specify which types of questions you want to investigate.
  • ORIG_OR_NEW: Whether studying GD-VCR or original VCR dev set.
  • grounding: Whether analyzing grounding results by visualizing attention weights.

Given our fine-tuned VisualBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
VisualBERT 53.27 **62.91** 52.04 45.39 51.85

ViLBERT

Prepare Environment

Prepare environment as mentioned in the original repo of ViLBERT.

Extract image features

We make use of the docker made for LXMERT. Detailed commands are shown below:

cd GD-VCR
git clone https://github.com/jiasenlu/bottom-up-attention.git
mv generate_tsv.py bottom-up-attention/tools
mv generate_tsv_gt.py bottom-up-attention/tools

docker pull airsplay/bottom-up-attention
docker run --name gd_vcr --runtime=nvidia -it -v /PATH/TO/:/PATH/TO/ airsplay/bottom-up-attention /bin/bash
[Used to enter into the docker]

cd /PATH/TO/GD-VCR/bottom-up-attention
pip install json_lines
pip install jsonlines
pip install python-dateutil==2.5.0

python ./tools/generate_tsv.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test.prototxt --out ../vilbert_beta/feature/VCR/VCR_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR
python ./tools/generate_tsv_gt.py --cfg experiments/cfgs/faster_rcnn_end2end_resnet.yml --def models/vg/ResNet-101/faster_rcnn_end2end_final/test_gt.prototxt --out ../vilbert_beta/feature/VCR/VCR_gt_resnet101_faster_rcnn_genome.tsv --net data/faster_rcnn_models/resnet101_faster_rcnn_final.caffemodel --total_group 1 --group_id 0 --split VCR_gt
[Used to extract features]

Then, exit the dockerfile, and convert extracted features into lmdb form:

cd GD-VCR/vilbert_beta
python script/convert_lmdb_VCR.py
python script/convert_lmdb_VCR_gt.py

Fine-tune model on original VCR

Download the pre-trained checkpoint to GD-VCR/vilbert_beta/save/bert_base_6_layer_6_connect_freeze_0/.

Then, use the command to fine-tune:

cd GD-VCR/vilbert_beta
python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained

For convenience, we provide a trained checkpoint [Link] for quick evaluation.

Evaluation on GD-VCR

CUDA_VISIBLE_DEVICES=0,1 python eval_tasks.py 
		--bert_model bert-base-uncased 
		--from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/vilbert_best.bin 
		--config_file config/bert_base_6layer_6conect.json --task 1 --split val  --batch_size 16

Note that if you want the results on original VCR dev set, you could directly change the "val_annotations_jsonpath" value of TASK1 to X_VCR/orig_val.jsonl.

Given our fine-tuned ViLBERT model above, the evaluation results are shown below:

Models Overall West South Asia East Asia Africa
ViLBERT 58.47 **65.82** 62.90 46.45 62.04

Dataset Construction

Here we provide dataset construction methods in our paper:

  • similarity.py: Compute the similarity among answer candidates and distribute candidates to each annotated questions.
  • relevance_model.py: Train a model to compute the relevance between question and answer.
  • question_cluster.py: Infer question templates from original VCR dataset as the basis of annotation.

For sake of convenience, we provide the trained relevance computation model [Link].

Acknowledgement

We thank for VisualBERT, ViLBERT, and Detectron authors' implementation. Also, we appreciate the effort of original VCR paper's author, and our work is highly influenced by VCR.

Citation

Please cite our EMNLP paper if this repository inspired your work.

@inproceedings{yin2021broaden,
  title = {Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning},
  author = {Yin, Da and Li, Liunian Harold and Hu, Ziniu and Peng, Nanyun and Chang, Kai-Wei},
  booktitle = {EMNLP},
  year = {2021}
}
Owner
Da Yin
Da Yin
QA-GNN: Question Answering using Language Models and Knowledge Graphs

QA-GNN: Question Answering using Language Models and Knowledge Graphs This repo provides the source code & data of our paper: QA-GNN: Reasoning with L

Michihiro Yasunaga 434 Jan 04, 2023
PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and Multi-Step Knowledge Distillation

PocketNet This is the official repository of the paper: PocketNet: Extreme Lightweight Face Recognition Network using Neural Architecture Search and M

Fadi Boutros 40 Dec 22, 2022
Final project code: Implementing MAE with downscaled encoders and datasets, for ESE546 FA21 at University of Pennsylvania

546 Final Project: Masked Autoencoder Haoran Tang, Qirui Wu 1. Training To train the network, please run mae_pretraining.py. Please modify folder path

Haoran Tang 0 Apr 22, 2022
Implementation of the Chamfer Distance as a module for pyTorch

Chamfer Distance for pyTorch This is an implementation of the Chamfer Distance as a module for pyTorch. It is written as a custom C++/CUDA extension.

Christian Diller 205 Jan 05, 2023
[AAAI 2022] Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation with Limited Annotation

A paper Introduction This is an official release of the paper Separate Contrastive Learning for Organs-at-Risk and Gross-Tumor-Volume Segmentation wit

Jiacheng Wang 14 Dec 08, 2022
BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis

Bilateral Denoising Diffusion Models (BDDMs) This is the official PyTorch implementation of the following paper: BDDM: BILATERAL DENOISING DIFFUSION M

172 Dec 23, 2022
Run Effective Large Batch Contrastive Learning on Limited Memory GPU

Gradient Cache Gradient Cache is a simple technique for unlimitedly scaling contrastive learning batch far beyond GPU memory constraint. This means tr

Luyu Gao 198 Dec 29, 2022
[NeurIPS 2021] Garment4D: Garment Reconstruction from Point Cloud Sequences

Garment4D [PDF] | [OpenReview] | [Project Page] Overview This is the codebase for our NeurIPS 2021 paper Garment4D: Garment Reconstruction from Point

Fangzhou Hong 112 Dec 23, 2022
Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt)

Deep Learning for Natural Language Processing SS 2021 (TU Darmstadt) Task Training huge unsupervised deep neural networks yields to strong progress in

2 Aug 05, 2022
Progressive Image Deraining Networks: A Better and Simpler Baseline

Progressive Image Deraining Networks: A Better and Simpler Baseline [arxiv] [pdf] [supp] Introduction This paper provides a better and simpler baselin

190 Dec 01, 2022
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 02, 2023
UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning This is the official PyTorch implementation for UniMoCo pape

dddzg 49 Jan 02, 2023
Nonnegative spatial factorization for multivariate count data

Nonnegative spatial factorization for multivariate count data This repository contains supporting code to facilitate reproducible analysis. For detail

Will Townes 24 Dec 19, 2022
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
Official Implementation of Domain-Aware Universal Style Transfer

Domain Aware Universal Style Transfer Official Pytorch Implementation of 'Domain Aware Universal Style Transfer' (ICCV 2021) Domain Aware Universal St

KibeomHong 80 Dec 30, 2022
PyTorch implementations of the NeRF model described in "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis"

PyTorch NeRF and pixelNeRF NeRF: Tiny NeRF: pixelNeRF: This repository contains minimal PyTorch implementations of the NeRF model described in "NeRF:

Michael A. Alcorn 178 Dec 20, 2022
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.

Light Gradient Boosting Machine LightGBM is a gradient boosting framework that uses tree based learning algorithms. It is designed to be distributed a

Microsoft 14.5k Jan 08, 2023
ScaleNet: A Shallow Architecture for Scale Estimation

ScaleNet: A Shallow Architecture for Scale Estimation Repository for the code of ScaleNet paper: "ScaleNet: A Shallow Architecture for Scale Estimatio

Axel Barroso 34 Nov 09, 2022
This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Orientation independent Möbius CNNs This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of

Maurice Weiler 59 Dec 09, 2022