Lexical Substitution Framework

Last update: Sep 15, 2022

Related tags

Overview

LexSubGen

Lexical Substitution Framework

This repository contains the code to reproduce the results from the paper:

Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, "Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution", Proceedings of the 28th International Conference on Computational Linguistics, 2020

Installation

Clone LexSubGen repository from github.com.

git clone https://github.com/Samsung/LexSubGen
cd LexSubGen

Setup anaconda environment

Download and install conda
Create new conda environment
```
conda create -n lexsubgen python=3.7.4
```
Activate conda environment
```
conda activate lexsubgen
```
Install requirements
```
pip install -r requirements.txt
```
Download spacy resources and install context2vec and word_forms from github repositories
```
./init.sh
```

Setup Web Application

If you do not plan to use the Web Application, skip this section and go to the next!

Download and install NodeJS and npm.
Run script for install dependencies and create build files.

bash web_app_setup.sh

Install lexsubgen library

python setup.py install

Results

Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.

Model	SemEval				COINCO
Model	GAP	[email protected]	[email protected]	[email protected]	GAP	[email protected]	[email protected]	[email protected]
OOC	44.65	16.82	12.83	18.36	46.3	19.58	15.03	12.99
C2V	55.82	7.79	5.92	11.03	48.32	8.01	6.63	7.54
C2V+embs	53.39	28.01	21.72	33.52	50.73	29.64	24.0	21.97
ELMo	53.66	11.58	8.55	13.88	49.47	13.58	10.86	11.35
ELMo+embs	54.16	32.0	22.2	31.82	52.22	35.96	26.62	23.8
BERT	54.42	38.39	27.73	39.57	50.5	42.56	32.64	28.73
BERT+embs	53.87	41.64	30.59	43.88	50.85	46.05	35.63	31.67
RoBERTa	56.74	32.25	24.26	36.65	50.82	35.12	27.35	25.41
RoBERTa+embs	58.74	43.19	31.19	44.61	54.6	46.54	36.17	32.1
XLNet	59.12	31.75	22.83	34.95	53.39	38.16	28.58	26.47
XLNet+embs	59.62	49.53	34.9	47.51	55.63	51.5	39.92	35.12

Results reproduction

Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in scripts/lexsub-all-models.sh Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run mlflow ui in LexSubGen directory and then open the web page in a browser.

Also you can use pytest to check the reproducibility. But it may take a long time:

pytest tests/results_reproduction

XLNet:

XLNet Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet'

XLNet CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet'

XLNet with embeddings similarity Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs'

XLNet with embeddings similarity CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs'

Word Sense Induction Results

Model	SemEval 2013	SemEval 2010
Model	AVG	AVG
XLNet	33.4	52.1
XLNet+embs	37.3	54.1

To reproduce these results use 2.3.0 version of transformers and the following command:

bash scripts/wsi.sh

Web application

You could use command line interface to run Web application.

# Run main server
lexsubgen-app run --host HOST 
                  --port PORT 
                  [--model-configs CONFIGS] 
                  [--start-ids START-IDS] 
                  [--start-all] 
                  [--restore-session]

Example:

# Run server and serve models BERT and XLNet. 
# For BERT create server for serving model and substitute generator instantly (load resources in memory).
# For XLNet create only server.
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]' 
                  --start-ids '[0]'

# After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'.
# The content of this file looks like:
# [
#     'my_cool_configs/bert.jsonnet',
#     'my_awesome_configs/xlnet.jsonnet',
# ]
# You can restore it with flag 'restore-session'
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --restore-session
# BERT and XLNet restored now

Arguments:

Argument	Default	Description
`--help`		Show this help message and exit
`--host`		IP address of running server host
`--port`	`5000`	Port for starting the server
`--model-configs`	`[]`	List of file paths to the model configs.
`--start-ids`	`[]`	Zero-based indices of served models for which substitute generators will be created
`--start-all`	`False`	Whether to create substitute generators for all served models
`--restore-session`	`False`	Whether to restore session from previous Web application run

FAQ

How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference: export CUDA_VISIBLE_DEVICES='1' or CUDA_VISIBLE_DEVICES='1' before your command.
How to run tests? - You can use pytest: pytest tests

Lexical Substitution Framework

Related tags

Overview

LexSubGen

Installation

Setup anaconda environment

Setup Web Application

Install lexsubgen library

Results

Results reproduction

XLNet:

Word Sense Induction Results

Web application

Arguments:

FAQ

Owner

Samsung

Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Iranian Cars Detection using Yolov5s, PyTorch

Classification of ecg datas for disease detection

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

Dynamica causal Bayesian optimisation

Deep Markov Factor Analysis (NeurIPS2021)

Breast Cancer Classification Model is applied on a different dataset

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

Human Detection - Pedestrian Detection using OpenCV Python

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Auto-Lama combines object detection and image inpainting to automate object removals

PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Lexical Substitution Framework

Related tags

Overview

LexSubGen

Installation

Setup anaconda environment

Setup Web Application

Install lexsubgen library

Results

Results reproduction

XLNet:

Word Sense Induction Results

Web application

Arguments:

FAQ

Owner

Samsung

Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Iranian Cars Detection using Yolov5s, PyTorch

Classification of ecg datas for disease detection

Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

Dynamica causal Bayesian optimisation

Deep Markov Factor Analysis (NeurIPS2021)

Breast Cancer Classification Model is applied on a different dataset

Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

Human Detection - Pedestrian Detection using OpenCV Python

Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Auto-Lama combines object detection and image inpainting to automate object removals

PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

“英特尔创新大师杯”深度学习挑战赛 赛道3：CCKS2021中文NLP地址相关性任务

A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

“英特尔创新大师杯”深度学习挑战赛赛道3：CCKS2021中文NLP地址相关性任务