Lexical Substitution Framework

Overview

LexSubGen

Lexical Substitution Framework

This repository contains the code to reproduce the results from the paper:

Arefyev Nikolay, Sheludko Boris, Podolskiy Alexander, Panchenko Alexander, "Always Keep your Target in Mind: Studying Semantics and Improving Performance of Neural Lexical Substitution", Proceedings of the 28th International Conference on Computational Linguistics, 2020

Installation

Clone LexSubGen repository from github.com.

git clone https://github.com/Samsung/LexSubGen
cd LexSubGen

Setup anaconda environment

  1. Download and install conda
  2. Create new conda environment
    conda create -n lexsubgen python=3.7.4
  3. Activate conda environment
    conda activate lexsubgen
  4. Install requirements
    pip install -r requirements.txt
  5. Download spacy resources and install context2vec and word_forms from github repositories
    ./init.sh

Setup Web Application

If you do not plan to use the Web Application, skip this section and go to the next!

  1. Download and install NodeJS and npm.
  2. Run script for install dependencies and create build files.
bash web_app_setup.sh

Install lexsubgen library

python setup.py install

Results

Results of the lexical substitution task are presented in the following table. To reproduce them, follow the instructions above to install the correct dependencies.

Model SemEval COINCO
GAP [email protected] [email protected] [email protected] GAP [email protected] [email protected] [email protected]
OOC 44.65 16.82 12.83 18.36 46.3 19.58 15.03 12.99
C2V 55.82 7.79 5.92 11.03 48.32 8.01 6.63 7.54
C2V+embs 53.39 28.01 21.72 33.52 50.73 29.64 24.0 21.97
ELMo 53.66 11.58 8.55 13.88 49.47 13.58 10.86 11.35
ELMo+embs 54.16 32.0 22.2 31.82 52.22 35.96 26.62 23.8
BERT 54.42 38.39 27.73 39.57 50.5 42.56 32.64 28.73
BERT+embs 53.87 41.64 30.59 43.88 50.85 46.05 35.63 31.67
RoBERTa 56.74 32.25 24.26 36.65 50.82 35.12 27.35 25.41
RoBERTa+embs 58.74 43.19 31.19 44.61 54.6 46.54 36.17 32.1
XLNet 59.12 31.75 22.83 34.95 53.39 38.16 28.58 26.47
XLNet+embs 59.62 49.53 34.9 47.51 55.63 51.5 39.92 35.12

Results reproduction

Here we list XLNet reproduction commands that correspond to the results presented in the table above. Reproduction commands for all models you can find in scripts/lexsub-all-models.sh Besides saving to the 'run-directory' all results are saved using mlflow. To check them you can run mlflow ui in LexSubGen directory and then open the web page in a browser.

Also you can use pytest to check the reproducibility. But it may take a long time:

pytest tests/results_reproduction
  • XLNet:

XLNet Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet'

XLNet CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet'

XLNet with embeddings similarity Semeval07:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/semeval_all.jsonnet --run-dir='debug/lexsub-all-models/semeval_all_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='semeval_all_xlnet_embs'

XLNet with embeddings similarity CoInCo:

python lexsubgen/evaluations/lexsub.py solve --substgen-config-path configs/subst_generators/lexsub/xlnet_embs.jsonnet --dataset-config-path configs/dataset_readers/lexsub/coinco.jsonnet --run-dir='debug/lexsub-all-models/coinco_xlnet_embs' --force --experiment-name='lexsub-all-models' --run-name='coinco_xlnet_embs'

Word Sense Induction Results

Model SemEval 2013 SemEval 2010
AVG AVG
XLNet 33.4 52.1
XLNet+embs 37.3 54.1

To reproduce these results use 2.3.0 version of transformers and the following command:

bash scripts/wsi.sh

Web application

You could use command line interface to run Web application.

# Run main server
lexsubgen-app run --host HOST 
                  --port PORT 
                  [--model-configs CONFIGS] 
                  [--start-ids START-IDS] 
                  [--start-all] 
                  [--restore-session]

Example:

# Run server and serve models BERT and XLNet. 
# For BERT create server for serving model and substitute generator instantly (load resources in memory).
# For XLNet create only server.
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --model-configs '["my_cool_configs/bert.jsonnet", "my_awesome_configs/xlnet.jsonnet"]' 
                  --start-ids '[0]'

# After shutting down server JSON file with session dumps in the '~/.cache/lexsubgen/app_session.json'.
# The content of this file looks like:
# [
#     'my_cool_configs/bert.jsonnet',
#     'my_awesome_configs/xlnet.jsonnet',
# ]
# You can restore it with flag 'restore-session'
lexsubgen-app run --host '0.0.0.0' 
                  --port 5000 
                  --restore-session
# BERT and XLNet restored now
Arguments:
Argument Default Description
--help Show this help message and exit
--host IP address of running server host
--port 5000 Port for starting the server
--model-configs [] List of file paths to the model configs.
--start-ids [] Zero-based indices of served models for which substitute generators will be created
--start-all False Whether to create substitute generators for all served models
--restore-session False Whether to restore session from previous Web application run

FAQ

  1. How to use gpu? - You can use environment variable CUDA_VISIBLE_DEVICES to use gpu for inference: export CUDA_VISIBLE_DEVICES='1' or CUDA_VISIBLE_DEVICES='1' before your command.
  2. How to run tests? - You can use pytest: pytest tests
Owner
Samsung
Samsung Electronics Co.,Ltd.
Samsung
Benchmark for the generalization of 3D machine learning models across different remeshing/samplings of a surface.

Discretization Robust Correspondence Benchmark One challenge of machine learning on 3D surfaces is that there are many different representations/sampl

Nicholas Sharp 10 Sep 30, 2022
Iranian Cars Detection using Yolov5s, PyTorch

Iranian Cars Detection using Yolov5 Train 1- git clone https://github.com/ultralytics/yolov5 cd yolov5 pip install -r requirements.txt 2- Dataset ../

Nahid Ebrahimian 22 Dec 05, 2022
Classification of ecg datas for disease detection

ecg_classification Classification of ecg datas for disease detection

Atacan ÖZKAN 5 Sep 09, 2022
Pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021).

Pytorch code for SS-Net This is a pytorch implementation of Straight Sampling Network For Point Cloud Learning (ICIP2021). Environment Code is tested

Sun Ran 1 May 18, 2022
This repository holds the code for the paper "Deep Conditional Gaussian Mixture Model forConstrained Clustering".

Deep Conditional Gaussian Mixture Model for Constrained Clustering. This repository holds the code for the paper Deep Conditional Gaussian Mixture Mod

17 Oct 30, 2022
ktrain is a Python library that makes deep learning and AI more accessible and easier to apply

Overview | Tutorials | Examples | Installation | FAQ | How to Cite Welcome to ktrain News and Announcements 2020-11-08: ktrain v0.25.x is released and

Arun S. Maiya 1.1k Jan 02, 2023
PyTorch Implementation of Fully Convolutional Networks. (Training code to reproduce the original result is available.)

pytorch-fcn PyTorch implementation of Fully Convolutional Networks. Requirements pytorch = 0.2.0 torchvision = 0.1.8 fcn = 6.1.5 Pillow scipy tqdm

Kentaro Wada 1.6k Jan 07, 2023
Codes for the AAAI'22 paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning"

TransZero [arXiv] This repository contains the testing code for the paper "TransZero: Attribute-guided Transformer for Zero-Shot Learning" accepted to

Shiming Chen 52 Jan 01, 2023
Dynamica causal Bayesian optimisation

Dynamic Causal Bayesian Optimization This is a Python implementation of Dynamic Causal Bayesian Optimization as presented at NeurIPS 2021. Abstract Th

nd308 18 Nov 22, 2022
Deep Markov Factor Analysis (NeurIPS2021)

Deep Markov Factor Analysis (DMFA) Codes and experiments for deep Markov factor analysis (DMFA) model accepted for publication at NeurIPS2021: A. Farn

Sarah Ostadabbas 2 Dec 16, 2022
Breast Cancer Classification Model is applied on a different dataset

Breast Cancer Classification Model is applied on a different dataset

1 Feb 04, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
Implementation of QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application

QuickDraw - AirGesture Introduction Here is my python source code for QuickDraw - an online game developed by google, combined with AirGesture - a sim

Viet Nguyen 89 Dec 18, 2022
Human Detection - Pedestrian Detection using OpenCV Python

Pedestrian Detection using OpenCV Python Follow us on Instagram for Machine Lear

Hrishikesh Dutta 1 Jan 23, 2022
Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Updates (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training. Pyr

1.3k Jan 04, 2023
《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize This paper has been accpeted by Conference on Computer Vision and Pattern Rec

Xiangyu Chen 101 Jan 02, 2023
Auto-Lama combines object detection and image inpainting to automate object removals

Auto-Lama Auto-Lama combines object detection and image inpainting to automate object removals. It is build on top of DE:TR from Facebook Research and

44 Dec 09, 2022
PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks"

This repository is an official PyTorch(Geometric) implementation of G^2GNN in "Imbalanced Graph Classification via Graph-of-Graph Neural Networks". Th

Yu Wang (Jack) 13 Nov 18, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

基于 bert4keras 的一个baseline 不作任何 数据trick 单模 线上 最高可到 0.7891 # 基础 版 train.py 0.7769 # transformer 各层 cls concat 明神的trick https://xv44586.git

孙永松 7 Dec 28, 2021
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 19, 2021