Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Last update: Nov 20, 2022

Overview

Query Variation Generators

This repository contains the code and annotation data for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators".

Setup

Install the requirements using

pip install -r requirements.txt

Steps to reproduce the results

First we need to generate_weak supervsion for the desired test sets. We can do that with the scripts/generate_weak_supervision.py. In the paper we test for TREC-DL ('msmarco-passage/trec-dl-2019/judged') and ANTIQUE ('antique/train/split200-valid'), but any IR-datasets (https://ir-datasets.com/index.html) can be used here (as TASK).

python ${REPO_DIR}/examples/generate_weak_supervision.py 
    --task $TASK \
    --output_dir $OUT_DIR

This will generate one query variation for each method for the original queries. After this, we manually annotated the query variations generated, in order to keep only valid ones for analysis. For that we use analyze_weak_supervision.py (prepares data for manual anotation) and analyze_auto_query_generation_labeling.py (combines auto labels and anotations.).

However, for reproducing the results we can directly use the annotated query set to test neural ranking models robustness (RQ1):

python ${REPO_DIR}/disentangled_information_needs/evaluation/query_rewriting.py \
        --task 'irds:msmarco-passage/trec-dl-2019/judged' \
        --output_dir $OUT_DIR/ \
        --variations_file $OUT_DIR/$VARIATIONS_FILE_TREC_DL \
        --retrieval_model_name "BM25+KNRM" \
        --train_dataset "irds:msmarco-passage/train" \
        --max_iter $MAX_ITER

by using the annotated variations file directly here "$OUT_DIR/$VARIATIONS_FILE_TREC_DL". The same can be done to run rank fusion (RQ2) by replacing query_rewriting.py with rank_fusion.py.

The scripts evaluate_weak_supervision.sh and evaluate_rank_fusion.sh run all models and datasets for both research questions . The first generates the main table of results, Table 4 in the paper, and the second generates the tables for the rank fusion experiments (only available in the Arxiv version of the paper).

Modules and Folders

scripts: Contain most of the analysis scripts and also commands to run entire experiments.
examples: Contain an example on how to generate query variations.
disentangled_information_needs/evaluation: Scripts to evaluate robustness of models for query variations and also to evaluate rank fusion of query variations.
disentangled_information_needs/transformations: Methods to generate query variations.

Code for the ECIR'22 paper "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators"

Related tags

Overview

Query Variation Generators

Setup

Steps to reproduce the results

Modules and Folders

Owner

Gustavo Penha

Code for Discriminative Sounding Objects Localization (NeurIPS 2020)

Parsing, analyzing, and comparing source code across many languages

Repo for "Physion: Evaluating Physical Prediction from Vision in Humans and Machines" submission to NeurIPS 2021 (Datasets & Benchmarks track)

Writeups for the challenges from DownUnderCTF 2021

SPRING is a seq2seq model for Text-to-AMR and AMR-to-Text (AAAI2021).

Code for "CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds" @ICRA2021

Non-Homogeneous Poisson Process Intensity Modeling and Estimation using Measure Transport

NeRViS: Neural Re-rendering for Full-frame Video Stabilization

这是一个yolox-pytorch的源码，可以用于训练自己的模型。

PyTorch implementation of the implicit Q-learning algorithm (IQL)

O2O-Afford: Annotation-Free Large-Scale Object-Object Affordance Learning (CoRL 2021)

PyTorch implementation of DARDet: A Dense Anchor-free Rotated Object Detector in Aerial Images

This is the repository for the AAAI 21 paper [Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning].

Optimizaciones incrementales al problema N-Body con el fin de evaluar y comparar las prestaciones de los traductores de Python en el ámbito de HPC.

CMUA-Watermark: A Cross-Model Universal Adversarial Watermark for Combating Deepfakes (AAAI2022)

A GUI for Face Recognition, based upon Docker, Tkinter, GPU and a camera device.

PyTorch implementations of the paper: "Learning Independent Instance Maps for Crowd Localization"

Gesture recognition on Event Data

NCNN implementation of Real-ESRGAN. Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

AITom is an open-source platform for AI driven cellular electron cryo-tomography analysis.