System Combination for Grammatical Error Correction Based on Integer Programming

Last update: Mar 29, 2022

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.
Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.
Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Overview

System Combination for Grammatical Error Correction Based on Integer Programming

Reference

Prerequisites

Example

License

Owner

NUS NLP Group

Official implementation of Meta-StyleSpeech and StyleSpeech

Code for our paper at ECCV 2020: Post-Training Piecewise Linear Quantization for Deep Neural Networks

PyTorch implementation of federated learning framework based on the acceleration of global momentum

Official Implementation of PCT

Using image super resolution models with vapoursynth and speeding them up with TensorRT

LUKE -- Language Understanding with Knowledge-based Embeddings

Safe Local Motion Planning with Self-Supervised Freespace Forecasting, CVPR 2021

Tackling Obstacle Tower Challenge using PPO & A2C combined with ICM.

You Only Look Once for Panopitic Driving Perception

Demystifying How Self-Supervised Features Improve Training from Noisy Labels

Python package provinding tools for artistic interactive applications using AI

Pytorch implementation of Integrating Tree Path in Transformer for Code Representation

基于tensorflow 2.x的图片识别工具集

Attack on Confidence Estimation algorithm from the paper "Disrupting Deep Uncertainty Estimation Without Harming Accuracy"

Exploring the link between uncertainty estimates obtained via "exact" Bayesian inference and out-of-distribution (OOD) detection.

[CVPR2021] Look before you leap: learning landmark features for one-stage visual grounding.

This is a Deep Leaning API for classifying emotions from human face and human audios.

The repository for the paper "When Do You Need Billions of Words of Pretraining Data?"

An Abstract Cyber Security Simulation and Markov Game for OpenAI Gym

Source code of the paper Meta-learning with an Adaptive Task Scheduler.