GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

Related tags

Text Data & NLPGCRC
Overview

GCRC

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation

Introduction

Currently, machine reading comprehension models have made exciting progress, driven by a large number of publicly available data sets. However, the real language comprehension capabilities of models are far from what people expect, and most of the data sets provide black-box evaluations that fail to diagnose whether the system is based on correct reasoning processes. In order to alleviate these problems and promote machine intelligence to humanoid intelligence, Shanxi University focuses on the more diverse and challenging reading comprehension tasks of the college entrance examination, and attempts to evaluate machine intelligence effectively and practically based on standardized human tests. We collected gaokao reading comprehension test questions in the past 10 years and constructed a datasets which is GCRC(A New MRC Dataset from Gaokao Chinese for Explainable Evaluation) containing more than 5000 texts and more than 8,700 multiple-choice questions (about 15,000 options). The datasets is annotated three kinds of information: the sentence level support fact, interference item’s error cause and the reasoning skills required to answer questions. Related experiments show that this datasets is more challenging, which is very useful for diagnosing system limitations in an interpretable manner, and will help researchers develop new machine learning and reasoning methods to solve these challenging problems in the future.

Leaderboard

GCRC Leaderboard for Explainable Evaluation

Paper

GCRC: A New Challenging MRC Dataset from Gaokao Chinese for Explainable Evaluation. ACL 2021 Findings.

Data Size

Train:6,994 questions;Dev:863 questions;Test:862 questions

Data Format

Each instance is composed of id (id, a string), title (title, a string), passage (passage, a string), question(question, a string), options (options, a list, representing the contents of A, B, C, and D, respectively), evidences (evidences, a list, representing the contents of the supporting sentence in the original text of A, B, C and D, respectively), reasoning_ability(reasoning_ability, a list,representing the reasoning ability required to answer questions of A, B, C and D, respectively), error_type (error_type, a list, representing the Error reason of A, B, C and D, respectively), answer(answer,a string).

Example

{
  "id": "gcrc_4916_8172", 
  "title": "我们需要怎样的科学素养", 
  "passage": "第八次中国公民科学素养调查显示,2010年,我国具备...激励科技创新、促进创新型国家建设,我们任重道远。", 
  "question": "下列对“我们需要怎样的科学素养”的概括,不正确的一项是", 
  "options":  [
    "科学素养是一项基本公民素质,公民科学素养可以从科学知识、科学方法和科学精神三个方面来衡量。",
    "不仅需要掌握足够的科学知识、科学方法,更需要具备学习、理解、表达、参与和决策科学事务的能力。",
    "应该明白科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面。", 
    "需要具备科学的思维和科学的精神,对科学技术能持怀疑态度,对于媒体信息具有质疑精神和过滤功能。"
  ],
  "evidences": [
    ["公民科学素养可以从三个方面衡量:科学知识、科学方法和科学精神。", "在“建设创新型国家”的语境中,科学素养作为一项基本公民素质的重要性不言而喻。"],
    ["一个具备科学素养的公民,不仅应该掌握足够的科学知识、科学方法,更需要强调科学的思维、科学的精神,理性认识科技应用到社会中可能产生的影响,进而具备学习、理解、表达、参与和决策科学事务的能力。"], 
    ["西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"], 
    ["甚至还有国家专门测试公众对于媒体信息是否具有质疑精神和过滤功能。", "西方发达国家不仅测试公众对科学技术与社会、经济、文化等各方面关系的看法,更考察公众对科学技术是否持怀疑态度,是否认为科学技术需要控制,期望科学技术解决哪些问题,希望所纳的税费使用于科学技术的哪些方面等。"]
   ],
  "error_type": ["E", "", "", ""],
  "answer": "A",
}

Evaluation Code

The prediction result needs to be consistent with the format of the training set.

python eval.py prediction_file test_private_file

Participants are required to complete the following tasks: Task 1: Output the answer to the question. Task 2: Output the sentence-level supporting facts(SFs) that support the answer to the question, that is, the original supporting sentences for each option. Task 3: Output the error cause of the interference option. There are 7 reasons for the error in this evaluation: 1) Wrong details; 2) Wrong temporal properties; 3) Wrong subject-predicate-object triple relationship; 4) Wrong necessary and sufficient conditions; 5) Wrong causality; 6) Irrelevant to the question; 7) Irrelevant to the article. The evaluation metrics are Task1_Acc, Task2_F1,Task3_Acc(The accuracy of error reason identification),and the output is in dictionary format.

return {"Task1_Acc":_, " Task2_F1":_, "Task3_Acc":_}

Author List

Hongye Tan, Xiaoyue Wang, Yu Ji, Ru Li, Xiaoli Li, Zhiwei Hu, Yunxiao Zhao, Xiaoqi Han.

Institutions

Shanxi University

Citation

Please kindly cite our paper if the work is helpful.

@inproceedings{tan-etal-2021-gcrc,
    title = "{GCRC}: A New Challenging {MRC} Dataset from {G}aokao {C}hinese for Explainable Evaluation",
    author = "Tan, Hongye  and
      Wang, Xiaoyue  and
      Ji, Yu  and
      Li, Ru  and
      Li, Xiaoli  and
      Hu, Zhiwei  and
      Zhao, Yunxiao  and
      Han, Xiaoqi",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.113",
    doi = "10.18653/v1/2021.findings-acl.113",
    pages = "1319--1330",
}
Owner
Yunxiao Zhao
Yunxiao Zhao
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

10 Jul 01, 2022
一个基于Nonebot2和go-cqhttp的娱乐性qq机器人

Takker - 一个普通的QQ机器人 此项目为基于 Nonebot2 和 go-cqhttp 开发,以 Sqlite 作为数据库的QQ群娱乐机器人 关于 纯兴趣开发,部分功能借鉴了大佬们的代码,作为Q群的娱乐+功能性Bot 声明 此项目仅用于学习交流,请勿用于非法用途 这是开发者的第一个Pytho

风屿 79 Dec 29, 2022
Natural Language Processing Tasks and Examples.

Natural Language Processing Tasks and Examples With the advancement of A.I. technology in recent years, natural language processing technology has bee

Soohwan Kim 53 Dec 20, 2022
基于百度的语音识别,用python实现,pyaudio+pyqt

Speech-recognition 基于百度的语音识别,python3.8(conda)+pyaudio+pyqt+baidu-aip 百度有面向python

J-L 1 Jan 03, 2022
Simple GUI where you can enter an article and get a crisp summarized version.

Text-Summarization-using-TextRank-BART Simple GUI where you can enter an article and get a crisp summarized version. How to run: Clone the repo Instal

Rohit P 4 Sep 28, 2022
Korean Simple Contrastive Learning of Sentence Embeddings using SKT KoBERT and kakaobrain KorNLU dataset

KoSimCSE Korean Simple Contrastive Learning of Sentence Embeddings implementation using pytorch SimCSE Installation git clone https://github.com/BM-K/

34 Nov 24, 2022
NLP project that works with news (NER, context generation, news trend analytics)

СоАвтор СоАвтор – платформа и открытый набор инструментов для редакций и журналистов-фрилансеров, который призван сделать процесс создания контента ма

38 Jan 04, 2023
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx

Anchored CorEx: Hierarchical Topic Modeling with Minimal Domain Knowledge Correlation Explanation (CorEx) is a topic model that yields rich topics tha

Greg Ver Steeg 592 Dec 18, 2022
Local cross-platform machine translation GUI, based on CTranslate2

DesktopTranslator Local cross-platform machine translation GUI, based on CTranslate2 Download Windows Installer You can either download a ready-made W

Yasmin Moslem 29 Jan 05, 2023
Code for Editing Factual Knowledge in Language Models

KnowledgeEditor Code for Editing Factual Knowledge in Language Models (https://arxiv.org/abs/2104.08164). @inproceedings{decao2021editing, title={Ed

Nicola De Cao 86 Nov 28, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
Code for using and evaluating SpanBERT.

SpanBERT This repository contains code and models for the paper: SpanBERT: Improving Pre-training by Representing and Predicting Spans. If you prefer

Meta Research 798 Dec 30, 2022
Rootski - Full codebase for rootski.io (without the data)

📣 Welcome to the Rootski codebase! This is the codebase for the application run

Eric 20 Nov 18, 2022
COVID-19 Related NLP Papers

COVID-19 outbreak has become a global pandemic. NLP researchers are fighting the epidemic in their own way.

xcfeng 28 Oct 30, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Simple, hackable offline speech to text - using the VOSK-API.

Campbell Barton 844 Jan 07, 2023
Programme de chiffrement et de déchiffrement inverse d'un message en python3.

Chiffrement Inverse En Python3 Programme de chiffrement et de déchiffrement inverse d'un message en python3. Explication du chiffrement inverse avec c

Malik Makkes 2 Mar 26, 2022
ByT5: Towards a token-free future with pre-trained byte-to-byte models

ByT5: Towards a token-free future with pre-trained byte-to-byte models ByT5 is a tokenizer-free extension of the mT5 model. Instead of using a subword

Google Research 409 Jan 06, 2023
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Text to speech for Vietnamese, ez to use, ez to update

Chào mọi người, đây là dự án mở nhằm giúp việc đọc được trở nên dễ dàng hơn. Rất cảm ơn đội ngũ Zalo đã cung cấp hạ tầng để mình có thể tạo ra app này

Trần Cao Minh Bách 32 Jul 29, 2022
This repo contains simple to use, pretrained/training-less models for speaker diarization.

PyDiar This repo contains simple to use, pretrained/training-less models for speaker diarization. Supported Models Binary Key Speaker Modeling Based o

12 Jan 20, 2022