Data for "Driving the Herd: Search Engines as Content Influencers" paper

Overview

herding_data

Data for "Driving the Herd: Search Engines as Content Influencers" paper

Dataset description

The collection contains 2250 documents, 30 initial relevant documents (round 0) - located in initial_documents.trectext file. 2100 documents (rounds 1-5) created by competitors. 120 documents are the example documents that were manually promoted in the herding method experiments.

This dataset is divided w.r.t. the different experiments for content effect, described in the paper.

Format: trectext. DOCNO Format: ROUND- - -

Relevance Judgments (qrels):

All documents in the collection were judged for relevance. Annotators were presented with both the title and the description of each TREC topic and were asked to classify a document as relevant if it satisfies the information need stated in the description.

A document judged relevant by less than three annotators was labeled as non-relevant (0). Documents judged relevant by at least three, four or five annotators were labeled as marginally relevant (1), fairly relevant (2) and highly relevant (3), respectively. For each experiment the relevance judgment file has ".rel" suffix.

Quality judgements:

All documents in the collection where judged for quality by five annotators. Annotators were presented with the text of the document and were asked to classify the docuemnt as: (1) Valid, (2) Keyword-stuffed, (3) Spam.

A document is deemed as keyword-stuffed if it contained excessive repetition of words which seemed unnatural or artificially introduced.

A document is considered as spam if its content could not possibly satisfy any information need.

If a document is not spam or keywordstuffed, it is considered as valid. Documents judged valid by at least three, four or five annotators were labeled as marginally high-quality (1), fairly high-quality (2) and highly high-quality (3), respectively. For each experiment the quality judgment file has ".ks" suffix.

Queries

We used 30 of ClueWeb09 queries which can be downloded here: http://trec.nist.gov/data/webmain.html.

Example documents

In the herding method experiment for each query and effect an exapmle document, manifesting the desired content effect, was manually promoted to 1'st place. For each effect the example documents are located at "herding__example_documents.trectext" file. The format of document names is: DOCNO Format: ROUND-00- -EXAMPLEDOC

Subtopic effect experiment

This content effect was tested both in terms of herding and biasing approaches. For each query 2 different subtopics were tested. The subtopics were taken from ClueWeb09 subtopics list. The mapping between qid and the subtopic number which was promoted (and the actual information need manifested by the subtopic) is located at _subtopics_map.txt files (in each relevant directory separetly).

We include relevance judgemnts for each document (competing for a rankings w.r.t a query) w.r.t. to both subtopics promoted for the query. Please note that each document was tested w.r.t. a single subtopic (can be induced by the mapping file) during the experiment. The judgments are for both subtopics for analysis porpuses only. Relevance judgments w.r.t. subtopics name is " _relevance_to_subptopic.rel".

The qrels format is: " ".

Directories

Herding

Document_length_effect

The data contained in this directory is related to the documents created in the document length effect experiment (herding method).

Non_relevance_effect

The data contained in this directory is related to the documents created in the non-relevance effect experiment (herding method).

Query_terms_effect

The data contained in this directory is related to the documents created in the query terms effect experiment (herding method).

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (herding method).

Biasing

Subtopic_effect

The data contained in this directory is related to the documents created in the subtopic effect experiment (biasing method).

Control

The data contained in this directory is related to the documents created in the control group. That is, no expore of any kind of manipulation for this group.

Dummies

The data contained in this directory is related to the documents taken from Raifer et al '17 dataset. Dummies with docnos "DUMMY_{0,1}" where shared over all groups.

Control group and biasing groups where filled with DUMMY_2 dummies (in the docno) as well.

Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Replication Code for "Self-Supervised Bug Detection and Repair" NeurIPS 2021

Self-Supervised Bug Detection and Repair This is the reference code to replicate the research in Self-Supervised Bug Detection and Repair in NeurIPS 2

Microsoft 85 Dec 24, 2022
To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

To propose and implement a multi-class classification approach to disaster assessment from the given data set of post-earthquake satellite imagery.

Kunal Wadhwa 2 Jan 05, 2022
AI Face Mesh: This is a simple face mesh detection program based on Artificial intelligence.

AI Face Mesh: This is a simple face mesh detection program based on Artificial Intelligence which made with Python. It's able to detect 468 different

Md. Rakibul Islam 1 Jan 13, 2022
This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras)

Yogi-Optimizer_Keras This is an implementation of Googles Yogi-Optimizer in Keras (tf.keras) The NeurIPS-Paper can be found here: http://papers.nips.c

14 Sep 13, 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone In our recent paper we propose the YourTTS model. YourTTS bri

Edresson Casanova 390 Dec 29, 2022
This repository includes different versions of the prescribed-time controller as Simulink blocks and MATLAB script codes for engineering applications.

Prescribed-time Control Prescribed-time control (PTC) blocks in Simulink environment, MATLAB R2020b. For more theoretical details, refer to the papers

Amir Shakouri 1 Mar 11, 2022
Source code for Fixed-Point GAN for Cloud Detection

FCD: Fixed-Point GAN for Cloud Detection PyTorch source code of Nyborg & Assent (2020). Abstract The detection of clouds in satellite images is an ess

Joachim Nyborg 8 Dec 22, 2022
Project for music generation system based on object tracking and CGAN

Project for music generation system based on object tracking and CGAN The project was inspired by MIDINet: A Convolutional Generative Adversarial Netw

1 Nov 21, 2021
Bachelor's Thesis in Computer Science: Privacy-Preserving Federated Learning Applied to Decentralized Data

federated is the source code for the Bachelor's Thesis Privacy-Preserving Federated Learning Applied to Decentralized Data (Spring 2021, NTNU) Federat

Dilawar Mahmood 25 Nov 30, 2022
Fast, accurate and reliable software for algebraic CT reconstruction

KCT CBCT Fast, accurate and reliable software for algebraic CT reconstruction. This set of software tools includes OpenCL implementation of modern CT

Vojtěch Kulvait 4 Dec 14, 2022
Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021)

Bridging the Gap between Label- and Reference based Synthesis(ICCV 2021) Tensorflow implementation of Bridging the Gap between Label- and Reference-ba

huangqiusheng 8 Jul 13, 2022
MAME is a multi-purpose emulation framework.

MAME's purpose is to preserve decades of software history. As electronic technology continues to rush forward, MAME prevents this important "vintage" software from being lost and forgotten.

Michael Murray 6 Oct 25, 2020
This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network.

GPRGNN This is the source code for our ICLR2021 paper: Adaptive Universal Generalized PageRank Graph Neural Network. Hidden state feature extraction i

Jianhao 92 Jan 03, 2023
PyTorch implementation of spectral graph ConvNets, NIPS’16

Graph ConvNets in PyTorch October 15, 2017 Xavier Bresson http://www.ntu.edu.sg/home/xbresson https://github.com/xbresson https://twitter.com/xbresson

Xavier Bresson 287 Jan 04, 2023
(NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductive few-shot classification"

SSR (NeurIPS 2021) Pytorch implementation of paper "Re-ranking for image retrieval and transductivefew-shot classification" [Paper] [Project webpage]

xshen 29 Dec 06, 2022
SimDeblur is a simple framework for image and video deblurring, implemented by PyTorch

SimDeblur (Simple Deblurring) is an open source framework for image and video deblurring toolbox based on PyTorch, which contains most deep-learning based state-of-the-art deblurring algorithms. It i

220 Jan 07, 2023
One Million Scenes for Autonomous Driving

ONCE Benchmark This is a reproduced benchmark for 3D object detection on the ONCE (One Million Scenes) dataset. The code is mainly based on OpenPCDet.

148 Dec 28, 2022
Dataset Condensation with Contrastive Signals

Dataset Condensation with Contrastive Signals This repository is the official implementation of Dataset Condensation with Contrastive Signals (DCC). T

3 May 19, 2022
Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

Patch-wise Adversarial Removal Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal

4 Oct 12, 2022