Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

Overview

Learning Structural Edits via Incremental Tree Transformations

Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)

1. Prepare Environment

We recommend using conda to manage the environment:

conda env create -n "structural_edits" -f structural_edits.yml
conda activate structural_edits

Install the punkt tokenizer:

python
>>> import nltk
>>> nltk.download('punkt')
>>> <ctrl-D>

2. Data

Please extract the datasets and vocabulary files by:

cd source_data
tar -xzvf githubedits.tar.gz

All necessary source data has been included as the following:

| --source_data
|       |-- githubedits
|           |-- githubedits.{train|train_20p|dev|test}.jsonl
|           |-- csharp_fixers.jsonl
|           |-- vocab.from_repo.{080910.freq10|edit}.json
|           |-- Syntax.xml
|           |-- configs
|               |-- ...(model config json files)

A sample file containing 20% of the GitHubEdits training data is included as source_data/githubedits/githubedits.train_20p.jsonl for running small experiments.

We have generated and included the vocabulary files as well. To create your own vocabulary, see edit_components/vocab.py.

Copyright: The original data were downloaded from Yin et al., (2019).

3. Experiments

See training and test scripts in scripts/githubedits/. Please configure the PYTHONPATH environment variable in line 6.

3.1 Training

For training, uncomment the desired setting in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/CONFIGURATION_FILE

where CONFIGURATION_FILE is the json file of your setting.

Supervised Learning

For example, if you want to train Graph2Edit + Sequence Edit Encoder on GitHubEdits's 20% sample data, please uncomment only line 21-25 in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.json

(Note: when you run the experiment for the first time, you might need to wait for ~15 minutes for data preprocessing.)

Imitation Learning

To further train the model with PostRefine imitation learning, please replace FOLDER_OF_SUPERVISED_PRETRAINED_MODEL with your model dir in source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.postrefine.imitation.json. Uncomment only line 27-31 in scripts/githubedits/train.sh and run:

bash scripts/githubedits/train.sh source_data/githubedits/configs/graph2iteredit.seq_edit_encoder.20p.postrefine.imitation.json

3.2 Test

To test a trained model, first uncomment only the desired setting in scripts/githubedits/test.sh and replace work_dir with your model directory, and then run:

bash scripts/githubedits/test.sh

4. Reference

If you use our code and data, please cite our paper:

@inproceedings{yao2021learning,
    title={Learning Structural Edits via Incremental Tree Transformations},
    author={Ziyu Yao and Frank F. Xu and Pengcheng Yin and Huan Sun and Graham Neubig},
    booktitle={International Conference on Learning Representations},
    year={2021},
    url={https://openreview.net/forum?id=v9hAX77--cZ}
}

Our implementation is adapted from TranX and Graph2Tree. We are grateful to the two work!

@inproceedings{yin18emnlpdemo,
    title = {{TRANX}: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation},
    author = {Pengcheng Yin and Graham Neubig},
    booktitle = {Conference on Empirical Methods in Natural Language Processing (EMNLP) Demo Track},
    year = {2018}
}
@inproceedings{yin2018learning,
    title={Learning to Represent Edits},
    author={Pengcheng Yin and Graham Neubig and Miltiadis Allamanis and Marc Brockschmidt and Alexander L. Gaunt},
    booktitle={International Conference on Learning Representations},
    year={2019},
    url={https://openreview.net/forum?id=BJl6AjC5F7},
}
Owner
NeuLab
Graham Neubig's Lab at LTI/CMU
NeuLab
A Pytorch Implementation of ClariNet

ClariNet A Pytorch Implementation of ClariNet (Mel Spectrogram -- Waveform) Requirements PyTorch 0.4.1 & python 3.6 & Librosa Examples Step 1. Downlo

Sungwon Kim 286 Sep 15, 2022
Easy and comprehensive assessment of predictive power, with support for neuroimaging features

Documentation: https://raamana.github.io/neuropredict/ News As of v0.6, neuropredict now supports regression applications i.e. predicting continuous t

Pradeep Reddy Raamana 93 Nov 29, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Tensorflow implementation of MIRNet for Low-light image enhancement

MIRNet Tensorflow implementation of the MIRNet architecture as proposed by Learning Enriched Features for Real Image Restoration and Enhancement. Lanu

Soumik Rakshit 91 Jan 06, 2023
Semi-supervised Learning for Sentiment Analysis

Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

47 Jan 01, 2023
UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG 84 Jan 04, 2023
OBBDetection: an oriented object detection toolbox modified from MMdetection

OBBDetection note: If you have questions or good suggestions, feel free to propose issues and contact me. introduction OBBDetection is an oriented obj

MIXIAOXIN_HO 3 Nov 11, 2022
This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression problems

Doctoral dissertation of Zheng Zhao This thesis is mainly concerned with state-space methods for a class of deep Gaussian process (DGP) regression pro

Zheng Zhao 21 Nov 14, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
code for TCL: Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022

Vision-Language Pre-Training with Triple Contrastive Learning, CVPR 2022 News (03/16/2022) upload retrieval checkpoints finetuned on COCO and Flickr T

187 Jan 02, 2023
Adversarial Color Enhancement: Generating Unrestricted Adversarial Images by Optimizing a Color Filter

ACE Please find the preliminary version published at BMVC 2020 in the folder BMVC_version, and its extended journal version in Journal_version. Datase

28 Dec 25, 2022
A way to store images in YAML.

YAMLImg A way to store images in YAML. I made this after seeing Roadcrosser's JSON-G because it was too inspiring to ignore this opportunity. Installa

5 Mar 14, 2022
Official release of MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis of Pancreatic Cancer axriv: http://arxiv.org/abs/2112.13513

MSHT: Multi-stage Hybrid Transformer for the ROSE Image Analysis This is the official page of the MSHT with its experimental script and records. We de

Tianyi Zhang 53 Dec 27, 2022
PyTorch implementation of Lip to Speech Synthesis with Visual Context Attentional GAN (NeurIPS2021)

Lip to Speech Synthesis with Visual Context Attentional GAN This repository contains the PyTorch implementation of the following paper: Lip to Speech

6 Nov 02, 2022
Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

One-Shot Voice Conversion with Weight Adaptive Instance Normalization By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain. This rep

31 Dec 07, 2022
Learning to Identify Top Elo Ratings with A Dueling Bandits Approach

Learning to Identify Top Elo Ratings We propose two algorithms MaxIn-Elo and MaxIn-mElo to solve the top players identification on the transitive and

2 Jan 14, 2022
Related resources for our EMNLP 2021 paper

Plan-then-Generate: Controlled Data-to-Text Generation via Planning Authors: Yixuan Su, David Vandyke, Sihui Wang, Yimai Fang, and Nigel Collier Code

Yixuan Su 61 Jan 03, 2023
Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"

High-Performance Brain-to-Text Communication via Handwriting Overview This repo is associated with this manuscript, preprint and dataset. The code can

Francis R. Willett 306 Jan 03, 2023
Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Adversarial Training Against Location-Optimized Adversarial Patches arXiv | Paper | Code | Video | Slides Code for the paper: Sukrut Rao, David Stutz,

Sukrut Rao 32 Dec 13, 2022
đź’ˇ Learnergy is a Python library for energy-based machine learning models.

Learnergy: Energy-based Machine Learners Welcome to Learnergy. Did you ever reach a bottleneck in your computational experiments? Are you tired of imp

Gustavo Rosa 57 Nov 17, 2022