This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Last update: Dec 14, 2022

Overview

Feedback Prize - Evaluating Student Writing

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing. The competition can be found here: https://www.kaggle.com/competitions/feedback-prize-2021/

Datasets required

Download competition data from https://www.kaggle.com/competitions/feedback-prize-2021/data and extract it to ../input/feedback-prize-2021/
Download folds from https://www.kaggle.com/datasets/ubamba98/feedbackgroupshufflesplit1337 and extract it to ../input/feedbackgroupshufflesplit1337/groupshufflesplit_1337.p
Download and convert LSG Roberta from https://github.com/ccdv-ai/convert_checkpoint_to_lsg/blob/main/convert_roberta_checkpoint.py and place it in ../input/lsg-roberta-large

Use this command to convert roberta-large to LSG

$ python convert_roberta_checkpoint.py \
                        --initial_model roberta-large \
                        --model_name lsg-roberta-large \
                        --max_sequence_length 1536

Download fast tokenizer for Deberta V2/V3 from https://www.kaggle.com/datasets/nbroad/deberta-v2-3-fast-tokenizer and extract it to ../input/deberta-v2-3-fast-tokenizer/

Follow following instructions to manually add fast tokenizer to transformer library:

# The following is necessary if you want to use the fast tokenizer for deberta v2 or v3
# This must be done before importing transformers
import shutil
from pathlib import Path

# Path to installed transformer library
transformers_path = Path("/opt/conda/lib/python3.7/site-packages/transformers")

input_dir = Path("../input/deberta-v2-3-fast-tokenizer")

convert_file = input_dir / "convert_slow_tokenizer.py"
conversion_path = transformers_path/convert_file.name

if conversion_path.exists():
    conversion_path.unlink()

shutil.copy(convert_file, transformers_path)
deberta_v2_path = transformers_path / "models" / "deberta_v2"

for filename in ['tokenization_deberta_v2.py', 'tokenization_deberta_v2_fast.py']:
    filepath = deberta_v2_path/filename
    if filepath.exists():
        filepath.unlink()

    shutil.copy(input_dir/filename, filepath)

After this ../input directory should look something like this.

.
├── input
│   ├── feedback-prize-2021
│   │   ├── train/
│   │   ├── test/
│   │   ├── sample_submission.csv
│   │   └── train.csv
│   ├── lsg-roberta-large
│   │   ├── config.json
│   │   ├── merges.txt
│   │   ├── modeling.py
│   │   ├── pytorch_model.bin
│   │   ├── special_tokens_map.json
│   │   ├── tokenizer.json
│   │   ├── tokenizer_config.json
│   │   └── vocab.json
│   ├── deberta-v2-3-fast-tokenizer
│   │   ├── convert_slow_tokenizer.py
│   │   ├── deberta__init__.py
│   │   ├── tokenization_auto.py
│   │   ├── tokenization_deberta_v2.py
│   │   ├── tokenization_deberta_v2_fast.py
│   │   └── transformers__init__.py
│   └── feedbackgroupshufflesplit1337
│       └── groupshufflesplit_1337.p

or you can change the DATA_BASE_DIR in SETTINGS.json to download the files in your desired location.

Models and Training

Deberta large, Deberta xlarge, Deberta v2 xlarge, Deberta v3 large, Funnel transformer large and BigBird are trained using trainer.py

Example:

$ python trainer.py --fold 0 --pretrained_model google/bigbird-roberta-large

where pretrained_model can be microsoft/deberta-large, microsoft/deberta-xlarge, microsoft/deberta-v2-xlarge, microsoft/deberta-v3-large, funnel-transformer/large or google/bigbird-roberta-large

Deberta large with LSTM head and jaccard loss is trained using debertabilstm_trainer.py

Example:

$ python debertabilstm_trainer.py --fold 0

Longformer large with LSTM head is trained using longformerwithbilstm_trainer.py

Example:

$ python longformerwithbilstm_trainer.py --fold 0

LSG Roberta is trained with lsgroberta_trainer.py

Example:

$ python lsgroberta_trainer.py --fold 0

YOSO is trained with yoso_trainer.py

Example:

$ python yoso_trainer.py --fold 0

Inference

After training all the models, the outputs were pushed to Kaggle Datasets.

And the final inference kernel can be found here: https://www.kaggle.com/code/cdeotte/2nd-place-solution-cv741-public727-private740?scriptVersionId=90301836

Solution writeup: https://www.kaggle.com/competitions/feedback-prize-2021/discussion/313389

This is the solution for 2nd rank in Kaggle competition: Feedback Prize - Evaluating Student Writing.

Related tags

Overview

Feedback Prize - Evaluating Student Writing

Datasets required

Models and Training

Inference

Owner

Udbhav Bamba

Styled Augmented Translation

Music Source Separation; Train & Eval & Inference piplines and pretrained models we used for 2021 ISMIR MDX Challenge.

Python wrapper class for OpenVINO Model Server. User can submit inference request to OVMS with just a few lines of code

The final project of "Applying AI to 3D Medical Imaging Data" from "AI for Healthcare" nanodegree - Udacity.

Official Code for "Non-deep Networks"

How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

This repository contains answers of the Shopify Summer 2022 Data Science Intern Challenge.

The code for the NSDI'21 paper "BMC: Accelerating Memcached using Safe In-kernel Caching and Pre-stack Processing".

Music library streaming app written in Flask & VueJS

Auto-updating data to assist in investment to NEPSE

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(2021) paper

Federated_learning codes used for the the paper "Evaluation of Federated Learning Aggregation Algorithms" and "A Federated Learning Aggregation Algorithm for Pervasive Computing: Evaluation and Comparison"

A Pytorch Implementation for Compact Bilinear Pooling.

Torchreid: Deep learning person re-identification in PyTorch.

Deep Inertial Prediction (DIPr)

Unofficial implementation of the paper: PonderNet: Learning to Ponder in TensorFlow

Official pytorch implementation of paper Dual-Level Collaborative Transformer for Image Captioning (AAAI 2021).

ParmeSan: Sanitizer-guided Greybox Fuzzing

DilatedNet in Keras for image segmentation

Semi-SDP Semi-supervised parser for semantic dependency parsing.