Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Overview

Unsupervised-Multi-hop-QA

This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NAACL 2021).

  • We propose MQA-QG, an unsupervised question answering framework that can generate human-like multi-hop training pairs from both homogeneous and heterogeneous data sources.

  • We find that we can train a competent multi-hop QA model with only generated data. The F1 gap between the unsupervised and fully-supervised models is less than 20 in both the HotpotQA and the HybridQA dataset.

  • Pretraining a multi-hop QA model with our generated data would greatly reduce the demand for human-annotated training data for multi-hop QA.

Introduction

The model first defines a set of basic operators to retrieve / generate relevant information from each input source or to aggregate different information, as follows.

Afterwards, we define six Reasoning Graphs. Each corresponds to one type of multihop question and is formulated as a computation graph built upon the operators. We generate multihop question-answer pairs by executing the reasoning graph.

Requirements

  • Python 3.7.3
  • torch 1.7.1
  • tqdm 4.49.0
  • transformers 4.3.3
  • stanza 1.1.1
  • nltk 3.5
  • dateparser 1.0.0
  • scikit-learn 0.23.2
  • fuzzywuzzy 0.18.0

Data Preparation

Make the following data directories:

mkdir -p ./Data
mkdir -p ./Data/HotpotQA
mkdir -p ./Data/HybridQA

a) HotpotQA

First, download the raw dataset of hotpotQA.

HOTPOT_HOME=./Data/HotpotQA
mkdir -p $HOTPOT_HOME/raw
mkdir -p $HOTPOT_HOME/dataset
cd $HOTPOT_HOME/raw
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_train_v1.1.json
wget http://curtis.ml.cmu.edu/datasets/hotpot/hotpot_dev_distractor_v1.json

Then, run the following code to preprocess the raw dataset.

python prep_data_hotpotQA.py \
  --train_dir $HOTPOT_HOME/raw/hotpot_train_v1.1.json \
  --dev_dir $HOTPOT_HOME/raw/hotpot_dev_distractor_v1.json \
  --output_dir $HOTPOT_HOME/dataset/

You would be able to get the following files in ./Data/HotpotQA/dataset/

train.src.json
train.qa.json
dev.src.json
dev.qa.json

b) HybridQA

Download all the tables and passages of HybridQA into your data folder.

HYBRID_HOME=./Data/HybridQA
cd HYBRID_HOME
git clone https://github.com/wenhuchen/WikiTables-WithLinks

The human annotated questions can be found here. Download train.json, dev.json, and dev_reference.json. Rename train.json as train.human.json; rename dev.json as dev.human.json, and put them into ./Data/HybridQA folder.

Operators

Here are the codes that test our key operators: QGwithAns and DescribeEnt.

a) QGwithAns

QGwithAns generate a single-hop question Q with answer A from the input text D. We implement this module based on the pretrained QG model from patil-suraj, a Google T5 model finetuned on the SQuAD 1.1 dataset.

You could test this module by running the following python codes:

from MQA_QG.Operators import T5_QG

test_passage = '''Jenson Alexander Lyons Button (born 19 January 1980) is a British racing driver and former Formula One driver. He won the 2009 Formula One World Championship, driving for Brawn GP.'''

nlp = T5_QG.pipeline("question-generation", model='valhalla/t5-base-qg-hl', qg_format="highlight")

print(nlp.qg_without_answer(test_passage))
print(nlp.qg_with_answer_text(test_passage, "19 January 1980"))

b) DescribeEnt

DescribeEnt generate a sentence S that describes the given entity E based on the information of the table T. We implement this using the GPT-TabGen model (Chen et al., 2020a). The model first uses template to flatten the table T into a document PT and then feed PT to the pre-trained GPT-2 model to generate the output sentence S. The framework is as follows.

We finetune the GPT2 model on the ToTTo dataset (Parikh et al., 2020), a large-scale dataset of controlled table-to-text generation. Our fine-tuned model can be downloaded here. After downloading the finetuned model, put it under the Pretrained_Models directory. Then you could test this module by running the following python codes:

from MQA_QG.Operators.Table_to_Text import get_GPT2_Predictor

predictor = get_GPT2_Predictor('./Pretrained_Models/table2text_GPT2_medium_ep9.pt', num_samples = 3)
flattened_table = '''The table title is Netherlands at the European Track Championships . The Medal is Bronze . The Championship is 2011 Apeldoorn . The Name is Kirsten Wild . The Event is Women's omnium . Start describing Kirsten Wild : '''
results = predictor.predict_output(flattened_table)
print(results)

Multi-hop Question Generation

After data preparation and testing operators, you could generate different types of multi-hop questions from (table, passage) in HybridQA or passages in HotpotQA. You simply need to configure your experimental setting in MQA_QG/config.py, as follows:

###### Global Settings
EXPERIMENT = 'HybridQA' # The experiment you want to run, choose 'HotpotQA' or 'HybridQA'
QG_DEVICE = 5  # gpu device to run the QG module
BERT_DEVICE = 3 # gpu device to run the BERT module
TABLE2TEXT_DEVICE = 3 # gpu devide to run the Table2Text module
QUESTION_TYPE = 'table2text' # the type of question you want to generate
# for hybridQA, the options are: 'table2text', 'text2table', 'text_only', 'table_only'
# for hotpotQA, the options are: 'text2text', 'comparison'
QUESTION_NUM = 3 # the number of questions to generate for each input

###### User-specified data directory
DATA_PATH = '../Data/HybridQA/WikiTables-WithLinks/' # root data directory, '../Data/HybridQA/WikiTables-WithLinks/' for HybridQA; '../Data/HotpotQA/dataset/train.src.txt' for HotpotQA
OUTPUT_PATH = '../Outputs/train_table_to_text.json' # the json file to store the generated questions
DATA_RANGE = [0, 20] # for debug use: the range of the dataset you considered (use [0, -1] to use the full dataset)
Table2Text_Model_Path = '../Pretrained_Models/table2text_GPT2_medium_ep9.pt' # the path to the pretrained Table2Text model

Key parameters:

  • EXPERIMENT: the dataset you want to generate questions from, choose 'HotpotQA' or 'HybridQA'.
  • QG_DEVICE, BERT_DEVICE, TABLE2TEXT_DEVICE: the gpu device to run the QG module, BERT module, and Table2Text module.
  • QUESTION_TYPE: the type of question you want to generate. There are 6 different types of questions you can generate. For hybridQA, the options are: 'table2text', 'text2table', 'text_only', 'table_only'. For hotpotQA, the options are: 'text2text', 'comparison'.
  • QUESTION_NUM: the number of questions to generate for each input.
  • DATA_PATH: root data directory, the defaults are: '../Data/HybridQA/WikiTables-WithLinks/' for HybridQA; '../Data/HotpotQA/dataset/train.src.txt' for HotpotQA.
  • OUTPUT_PATH: the json file to store the generated questions
  • Table2Text_Model_Path: the path to the pretrained Table2Text model.

After configuration, run the following python code to generate multi-hop questions.

cd MQA-QG
python run_multihop_generation

A sample of generated (question, answer) pair for HybridQA is:

{
  "table_id": "\"Weird_Al\"_Yankovic_0",
  "question": "In what film did the Dollmaker play the role of Batman?",
  "answer-text": "Batman vs. Robin",
  "answer-node": [
    [
      "Batman vs. Robin",
      [
        12,
        1
      ],
      "/wiki/Batman_vs._Robin",
      "table"
    ]
  ],
  "question_id": "6",
  "where": "table",
  "question_postag": "IN WDT NN VBD DT NN VB DT NN IN NNP ."
}

A sample of generated (question, answer) pair for HotpotQA is:

{
  "passage_id": "5a70f0c05542994082a3e404",
  "ques_ans": [
    {
      "question": "When did the name that is the nickname of Baz Ashmawy begin filming Culture Clash?",
      "answer": "September 2008"
    },
    {
      "question": "How did the book that is the nickname of Baz Ashmawy travel to film Culture Clash?",
      "answer": "travelled the world"
    },
    {
      "question": "What is the common name of the song that is the name of Bazil Ashmawy 's first television show?",
      "answer": "Baz Ashmawy"
    }
  ]
}

(Optional) You could then rank the generated questions by the PPL under the pretrained GPT-medium model, by running the following codes:

python run_ppl_ranking.py \
  --input_dir ../Outputs/train_text_to_table.json \
  --output_dir ../Outputs/PPL_rank_train_text_to_table.json

Unsupervised Multi-hop QA

a) HotpotQA

We use the SpanBERT (Joshi et al., 2020) as the QA model for HotpotQA.

Data Preparation

First, in the project root directory, run the following scripts to prepare the data.

# Prepare the human-labeled training set
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/train.src.json \
  --qa_path ./Data/HotpotQA/dataset/train.qa.json \
  --output_path ./Multihop_QA/HotpotQA/data/train.human.json

# Prepare the human-labeled dev set
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/dev.src.json \
  --qa_path ./Data/HotpotQA/dataset/dev.qa.json \
  --output_path ./Multihop_QA/HotpotQA/data/dev.human.json

# Prepare the generated training set 
# (the generated questions in the last multi-hop QG step, name it as `train.hotpot.generated.json`)
python Multihop_QA/HotpotQA/prepare_qa_data.py \
  --src_path ./Data/HotpotQA/dataset/train.src.json \
  --qa_path ./Data/HotpotQA/dataset/train.hotpot.generated.json \
  --output_path ./Multihop_QA/HotpotQA/data/train.generated.json

This will create three datasets in the ./Multihop_QA/HotpotQA/data/ directory:

  • train.human.json: the human-labeled HotpotQA training set (90442 samples).
  • dev.human.json: the human-labeled HotpotQA validation set (7405 samples).
  • train.generated.json: the QA pairs generated by our MQA-QG model.

You could skip this data preparation process by directly downloading the above three files here.

Model Training

In the ./Multihop_QA/HotpotQA/ folder, run bash train.sh to train the SpanBERT QA model. Here is an example configuration of train.sh:

#!/bin/bash
set -x

DATAHOME=./data
MODELHOME=./outputs/supervised

mkdir -p ${MODELHOME}

export CUDA_VISIBLE_DEVICES=2

python code/run_mrqa.py \
  --do_train \
  --do_eval \
  --model spanbert-large-cased \
  --train_file ${DATAHOME}/train.human.json \
  --dev_file ${DATAHOME}/dev.human.json \
  --train_batch_size 32 \
  --eval_batch_size 32 \
  --gradient_accumulation_steps 8 \
  --learning_rate 2e-5 \
  --num_train_epochs 4 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --eval_per_epoch 10 \
  --output_dir ${MODELHOME} \

There are two typical settings:

  • Supervised QA Setting: train the SpanBERT model on the human-labeled training set (train.human.json) and then evaluate the performance on the human-labeled validation set (dev.human.json).

  • Unsupervised QA Setting: train the SpanBERT model on the generated training set (train.generated.json) and then evaluate the performance on the human-labeled validation set (dev.human.json).

Evaluation

In the ./Multihop_QA/HotpotQA/ folder, run bash evaluate.sh to train the SpanBERT QA model. Here is an example configuration of evaluate.sh:

set -x

DATAHOME=./data/dev.human.json
MODELHOME=./outputs/unsupervised

export CUDA_VISIBLE_DEVICES=4

python code/run_mrqa.py \
  --do_eval \
  --eval_test \
  --model spanbert-large-cased \
  --test_file ${DATAHOME} \
  --eval_batch_size 32 \
  --max_seq_length 512 \
  --doc_stride 128 \
  --output_dir ${MODELHOME}

After evaluation, two files will be outputed to the model path:

  • test_results.txt: reporting the EM and F1.
  • predictions.txt: saving the QA results.

b) HybridQA

We use the HYBRIDER (Chen et al., 2020b) as the QA model for HybridQA.

Data Preparation

First, in the project root directory, run the following scripts to prepare the data. Suppose the generated questions in the last multi-hop QG step are saved in train.generated.json and put it into ./Data/HybridQA/ folder.

# Prepare the human-labeled train set
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/train.human.json \
  --data_split train \
  --output_path ./Multihop_QA/HybridQA/data/human

# Prepare the human-labeled dev set
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/dev.human.json \
  --data_split dev \
  --output_path ./Multihop_QA/HybridQA/data/human

# Prepare the generated training set 
python Multihop_QA/HybridQA/prepare_qa_data.py \
  --input_path ./Data/HybridQA/train.generated.json \
  --data_split train \
  --output_path ./Multihop_QA/HybridQA/data/generated

This will create two folders in the ./Multihop_QA/HybridQA/data/ directory:

  • generated: the processed generated train set.
  • human: the processed human-labeled train and dev set.

You could skip this data preparation process by directly downloading the above two folders here.

Model Training

Note that training the HYBRIDER model requires transformer==2.6.0

In the ./Multihop_QA/HybridQA/ folder, run bash train.sh to train the HYBRIDER QA model. Here is an example configuration of train.sh:

python train_stage12.py \
    --do_lower_case \
    --do_train \
    --train_file ./data/human/stage1_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --learning_rate 2e-6 \
    --option stage1 \
    --num_train_epochs 3.0 \
    --gpu_index 6 \
    --cache_dir ./tmp/

python train_stage12.py \
    --do_lower_case \
    --do_train \
    --train_file ./data/human/stage2_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --learning_rate 5e-6 \
    --option stage2 \
    --num_train_epochs 3.0 \
    --gpu_index 6 \
    --cache_dir ./tmp/

python train_stage3.py \
    --do_train  \
    --do_lower_case \
    --train_file ./data/human/stage3_train_data.json \
    --resource_dir ../../Data/HybridQA/WikiTables-WithLinks \
    --per_gpu_train_batch_size 12 \
    --learning_rate 3e-5 \
    --num_train_epochs 4.0 \
    --max_seq_length 384 \
    --doc_stride 128 \
    --threads 8 \
    --gpu_index 6 \
    --cache_dir ./tmp/

There are two typical settings:

  • Supervised QA Setting: train the HYBRIDER model on the human-labeled training set. Set the train_file as ./data/human/stage1(2)(3)_train_data.json.

  • Unsupervised QA Setting: train the HYBRIDER model on the generated training set. Set the train_file as ./data/generated/stage1(2)(3)_train_data.json.

Evaluation

In the ./Multihop_QA/HybridQA/ folder, run bash evaluate.sh to evaluate the HYBRIDER QA model.

Reference

Please cite the paper in the following format if you use this dataset during your research.

@inproceedings{pan-etal-2021-MQA-QG,
  title={Unsupervised Multi-hop Question Answering by Question Generation},
  author={Liangming Pan, Wenhu Chen, Wenhan Xiong, Min-Yen Kan, William Yang Wang},
  booktitle = {Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
  address = {Online},
  month = {June},
  year = {2021}
}

Q&A

If you encounter any problem, please either directly contact the first author or leave an issue in the github repo.

Owner
Liangming Pan
I am a third year Computer Science Ph.D. student at National University of Singapore.
Liangming Pan
Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

DeepMind 506 Jan 08, 2023
Back to Basics: Efficient Network Compression via IMP

Back to Basics: Efficient Network Compression via IMP Authors: Max Zimmer, Christoph Spiegel, Sebastian Pokutta This repository contains the code to r

IOL Lab @ ZIB 1 Nov 19, 2021
A Python package for time series augmentation

tsaug tsaug is a Python package for time series augmentation. It offers a set of augmentation methods for time series, as well as a simple API to conn

Arundo Analytics 278 Jan 01, 2023
A concise but complete implementation of CLIP with various experimental improvements from recent papers

x-clip (wip) A concise but complete implementation of CLIP with various experimental improvements from recent papers Install $ pip install x-clip Usag

Phil Wang 515 Dec 26, 2022
Distance correlation and related E-statistics in Python

dcor dcor: distance correlation and related E-statistics in Python. E-statistics are functions of distances between statistical observations in metric

Carlos Ramos CarreƱo 108 Dec 27, 2022
A custom DeepStack model that has been trained detecting ONLY the USPS logo

This repository provides a custom DeepStack model that has been trained detecting ONLY the USPS logo. This was created after I discovered that the Deepstack OpenLogo custom model I was using did not

Stephen Stratoti 9 Dec 27, 2022
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 09, 2023
Official PyTorch code for WACV 2022 paper "CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows"

CFLOW-AD: Real-Time Unsupervised Anomaly Detection with Localization via Conditional Normalizing Flows WACV 2022 preprint:https://arxiv.org/abs/2107.1

Denis 156 Dec 28, 2022
Official Datasets and Implementation from our Paper "Video Class Agnostic Segmentation in Autonomous Driving".

Video Class Agnostic Segmentation [Method Paper] [Benchmark Paper] [Project] [Demo] Official Datasets and Implementation from our Paper "Video Class A

Mennatullah Siam 26 Oct 24, 2022
This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

SummaC: Summary Consistency Detection This repository contains the code for TACL2021 paper: SummaC: Re-Visiting NLI-based Models for Inconsistency Det

Philippe Laban 24 Jan 03, 2023
True Few-Shot Learning with Language Models

This codebase supports using language models (LMs) for true few-shot learning: learning to perform a task using a limited number of examples from a single task distribution.

Ethan Perez 124 Jan 04, 2023
face2comics by Sxela (Alex Spirin) - face2comics datasets

This is a paired face to comics dataset, which can be used to train pix2pix or similar networks.

Alex 164 Nov 13, 2022
Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

DSN-IQA Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment" Requirements Python =3.8.0 Pytorch =1.7.1 Usage wit

7 Oct 13, 2022
End-To-End Crowdsourcing

End-To-End Crowdsourcing Comparison of traditional crowdsourcing approaches to a state-of-the-art end-to-end crowdsourcing approach LTNet on sentiment

Andreas Koch 1 Mar 06, 2022
Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Paint Transformer: Feed Forward Neural Painting with Stroke Prediction [Paper] [PaddlePaddle Implementation] Homepage of paper: Paint Transformer: Fee

442 Dec 16, 2022
diablo2 resurrected loot filter

Only For Chinese and Traditional Chinese The filter only for Chinese and Traditional Chinese, i didn't change it for other language.Maybe you could mo

elmagnifico 249 Dec 04, 2022
Extract MNIST handwritten digits dataset binary file into bmp images

MNIST-dataset-extractor Extract MNIST handwritten digits dataset binary file into bmp images More info at http://yann.lecun.com/exdb/mnist/ Dependenci

Omar Mostafa 6 May 24, 2021
Notepy is a full-featured Notepad Python app

Notepy A full featured python text-editor Notable features Autocompletion for parenthesis and quote Auto identation Syntax highlighting Compile and ru

Mirko Rovere 11 Sep 28, 2022
The official implementation of CircleNet: Anchor-free Detection with Circle Representation, MICCAI 2030

CircleNet: Anchor-free Detection with Circle Representation The official implementation of CircleNet, MICCAI 2020 [PyTorch] [project page] [MICCAI pap

The Biomedical Data Representation and Learning Lab 45 Nov 18, 2022
A curated list of awesome Deep Learning tutorials, projects and communities.

Awesome Deep Learning Table of Contents Books Courses Videos and Lectures Papers Tutorials Researchers Websites Datasets Conferences Frameworks Tools

Christos 20k Jan 05, 2023