Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Overview

MTM

This is the official repository of the paper:

Article Reranking by Memory-enhanced Key Sentence Matching for Detecting Previously Fact-checked Claims.

Qiang Sheng, Juan Cao, Xueyao Zhang, Xirong Li, and Lei Zhong.

Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

PDF / Poster / Code / Chinese Dataset / Chinese Blog 1 / Chinese Blog 2

Datasets

There are two experimental datasets, including the Twitter Dataset, and the firstly proposed Weibo Dataset. Note that you can download the Weibo Dataset only after an "Application to Use the Chinese Dataset for Detecting Previously Fact-Checked Claim" has been submitted.

Code

Key Requirements

python==3.6.10
torch==1.6.0
torchvision==0.7.0
transformers==3.2.0

Usage for Weibo Dataset

After you download the dataset (the way to access is described here), move the FN_11934_filtered.json and DN_27505_filtered.json into the path MTM/dataset/Weibo/raw:

mkdir MTM/dataset/Weibo/raw
mv FN_11934_filtered.json MTM/dataset/Weibo/raw
mv DN_27505_filtered.json MTM/dataset/Weibo/raw

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_weibo.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_weibo.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Weibo --pretrained_model bert-base-chinese
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Weibo --pretrained_model bert-base-chinese --save './ckpts/Weibo' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.01 \
    --fp16 True
    

    then you can get ckpts/Weibo/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Weibo --pretrained_model bert-base-chinese \
    --rouge_bert_model_file './ckpts/Weibo/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Weibo
    

    and you can get data/Weibo/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_weibo.sh:

    python kmeans_clustering.py --dataset Weibo --pretrained_model bert-base-chinese --clustering_data_file 'data/Weibo/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Weibo/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Weibo.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Weibo

You can refer to the run_weibo.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Weibo' \
--dataset 'Weibo' --pretrained_model 'bert-base-chinese' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Weibo/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Weibo/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Weibo/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Weibo/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 3 \
--lr 5e-6 --epochs 10 --batch_size 32 \

then the results and ranking reports will be saved in ckpts/Weibo.

Usage for Twitter Dataset

The description of the dataset can be seen at here.

Preparation

Tokenize

cd MTM/preprocess/tokenize
sh run_twitter.sh

ROT

cd MTM/preprocess/ROT

You can refer to the run_twitter.sh, which includes three steps:

  1. Prepare RougeBert's Training data:

    python prepare_for_rouge.py --dataset Twitter --pretrained_model bert-base-uncased
    
  2. Training:

    CUDA_VISIBLE_DEVICES=0 python main.py --debug False \
    --dataset Twitter --pretrained_model bert-base-uncased --save './ckpts/Twitter' \
    --rouge_bert_encoder_layers 1 --rouge_bert_regularize 0.05 \
    --fp16 True
    

    then you can get ckpts/Twitter/[EPOCH].pt.

  3. Vectorize the claims and articles (get embeddings):

    CUDA_VISIBLE_DEVICES=0 python get_embeddings.py \
    --dataset Twitter --pretrained_model bert-base-uncased \
    --rouge_bert_model_file './ckpts/Twitter/[EPOCH].pt' \
    --batch_size 1024 --embeddings_type static
    

PMB

cd MTM/preprocess/PMB
  1. Prepare the clustering data:

    mkdir data
    mkdir data/Twitter
    

    and you can get data/Twitter/clustering_training_data_[TS_SMALL] <[TS_LARGE].pkl after running calculate_init_thresholds.ipynb.

  2. Kmeans clustering. You can refer to the run_twitter.sh:

    python kmeans_clustering.py --dataset Twitter --pretrained_model bert-base-uncased --clustering_data_file 'data/Twitter/clustering_training_data_[TS_SMALL]
         
          <[TS_LARGE].pkl'
    
         

    then you can get data/Twitter/kmeans_cluster_centers.npy.

Besides, it is available to see some cases of key sentences selection in key_sentences_selection_cases_Twitter.ipynb.

Training and Inferring

cd MTM/model
mkdir data
mkdir data/Twitter

You can refer to the run_twitter.sh:

CUDA_VISIBLE_DEVICES=0 python main.py --debug False --save 'ckpts/Twitter' \
--dataset 'Twitter' --pretrained_model 'bert-base-uncased' \
--rouge_bert_model_file '../preprocess/ROT/ckpts/Twitter/[EPOCH].pt' \
--memory_init_file '../preprocess/PMB/data/Twitter/kmeans_cluster_centers.npy' \
--claim_sentence_distance_file './data/Twitter/claim_sentence_distance.pkl' \
--pattern_sentence_distance_init_file './data/Twitter/pattern_sentence_distance_init.pkl' \
--memory_updated_step 0.3 --lambdaQ 0.6 --lambdaP 0.4 \
--selected_sentences 5 \
--lr 1e-4 --epochs 10 --batch_size 16 \

then the results and ranking reports will be saved in ckpts/Twitter.

Citation

@inproceedings{MTM,
  author    = {Qiang Sheng and
               Juan Cao and
               Xueyao Zhang and
               Xirong Li and
               Lei Zhong},
  title     = {Article Reranking by Memory-Enhanced Key Sentence Matching for Detecting
               Previously Fact-Checked Claims},
  booktitle = {Proceedings of the 59th Annual Meeting of the Association for Computational
               Linguistics and the 11th International Joint Conference on Natural
               Language Processing, {ACL/IJCNLP} 2021},
  pages     = {5468--5481},
  publisher = {Association for Computational Linguistics},
  year      = {2021},
  url       = {https://doi.org/10.18653/v1/2021.acl-long.425},
  doi       = {10.18653/v1/2021.acl-long.425},
}
Owner
ICTMCG
Multimedia Computing Group, Institute of Computing Technology, Chinese Academy of Sciences. Our official account on WeChat: ICTMCG.
ICTMCG
Akshat Surolia 2 May 11, 2022
Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Ceph.

Project Aquarium Project Aquarium is a SUSE-sponsored open source project aiming at becoming an easy to use, rock solid storage appliance based on Cep

Aquarist Labs 73 Jul 21, 2022
ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

VOODOO Revealing supercooled liquid beyond lidar attenuation Explore the docs » Report Bug · Request Feature Table of Contents About The Project Built

remsens-lim 2 Apr 28, 2022
Code for our NeurIPS 2021 paper Mining the Benefits of Two-stage and One-stage HOI Detection

CDN Code for our NeurIPS 2021 paper "Mining the Benefits of Two-stage and One-stage HOI Detection". Contributed by Aixi Zhang*, Yue Liao*, Si Liu, Mia

71 Dec 14, 2022
Code for "R-GCN: The R Could Stand for Random"

RR-GCN: Random Relational Graph Convolutional Networks PyTorch Geometric code for the paper "R-GCN: The R Could Stand for Random" RR-GCN is an extensi

PreDiCT.IDLab 31 Sep 07, 2022
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

yobi byte 29 Oct 09, 2022
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
An implementation of MobileFormer

MobileFormer An implementation of MobileFormer proposed by Yinpeng Chen, Xiyang Dai et al. Including [1] Mobile-Former proposed in:

slwang9353 62 Dec 28, 2022
This repo is the code release of EMNLP 2021 conference paper "Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories".

Connect-the-Dots: Bridging Semantics between Words and Definitions via Aligning Word Sense Inventories This repo is the code release of EMNLP 2021 con

12 Nov 22, 2022
Fre-GAN: Adversarial Frequency-consistent Audio Synthesis

Fre-GAN Vocoder Fre-GAN: Adversarial Frequency-consistent Audio Synthesis Training: python train.py --config config.json Citation: @misc{kim2021frega

Rishikesh (ऋषिकेश) 93 Dec 17, 2022
DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation This repository is the implementation of DynaTune paper. This folder

4 Nov 02, 2022
A TensorFlow implementation of the Mnemonic Descent Method.

MDM A Tensorflow implementation of the Mnemonic Descent Method. Mnemonic Descent Method: A recurrent process applied for end-to-end face alignment G.

123 Oct 07, 2022
[CVPR 2022] Unsupervised Image-to-Image Translation with Generative Prior

GP-UNIT - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Unsupervised Image-to-

Shuai Yang 125 Jan 03, 2023
DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations

DSTC10 Track 2 - Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations This repository contains the data, scripts and baseline co

Alexa 51 Dec 17, 2022
RefineMask (CVPR 2021)

RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features (CVPR 2021) This repo is the official implementation of RefineMask:

Gang Zhang 191 Jan 07, 2023
Manifold Alignment for Semantically Aligned Style Transfer

Manifold Alignment for Semantically Aligned Style Transfer [Paper] Getting Started MAST has been tested on CentOS 7.6 with python = 3.6. It supports

35 Nov 14, 2022
Detector for Log4Shell exploitation attempts

log4shell-detector Detector for Log4Shell exploitation attempts Idea The problem with the log4j CVE-2021-44228 exploitation is that the string can be

Florian Roth 729 Dec 25, 2022
CS550 Machine Learning course project on CNN Detection.

CNN Detection (CS550 Machine Learning Project) Team Members (Tensor) : Yadava Kishore Chodipilli (11940310) Thashmitha BS (11941250) This is a work do

yaadava_kishore 2 Jan 30, 2022
CLIP+FFT text-to-image

Aphantasia This is a text-to-image tool, part of the artwork of the same name. Based on CLIP model, with FFT parameterizer from Lucent library as a ge

vadim epstein 690 Jan 02, 2023
Semi-supervised semantic segmentation needs strong, varied perturbations

Semi-supervised semantic segmentation using CutMix and Colour Augmentation Implementations of our papers: Semi-supervised semantic segmentation needs

146 Dec 20, 2022