Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Last update: Jul 07, 2021

Overview

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

We provide reference implementations of various sequence modeling papers:

List of implemented papers

Convolutional Neural Networks (CNN)
LightConv and DynamicConv models
- Pay Less Attention with Lightweight and Dynamic Convolutions (Wu et al., 2019)
Long Short-Term Memory (LSTM) networks
- Effective Approaches to Attention-based Neural Machine Translation (Luong et al., 2015)
Transformer (self-attention) networks
Non-autoregressive Transformers
- Non-Autoregressive Neural Machine Translation (Gu et al., 2017)
- Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement (Lee et al. 2018)
- Insertion Transformer: Flexible Sequence Generation via Insertion Operations (Stern et al. 2019)
- Mask-Predict: Parallel Decoding of Conditional Masked Language Models (Ghazvininejad et al., 2019)
- Levenshtein Transformer (Gu et al., 2019)
Finetuning
- Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)

What's New:

March 2021 Added full parameter and optimizer state sharding + CPU offloading
February 2021 Added LASER training code
December 2020: Added Adaptive Attention Span code
December 2020: GottBERT model and code released
November 2020: Adopted the Hydra configuration framework
- see documentation explaining how to use it for new and existing projects
November 2020: fairseq 0.10.0 released
October 2020: Added R3F/R4F (Better Fine-Tuning) code
October 2020: Deep Transformer with Latent Depth code released
October 2020: Added CRISS models and code

Previous updates

September 2020: Added Linformer code
September 2020: Added pointer-generator networks
August 2020: Added lexically constrained decoding
August 2020: wav2vec2 models and code released
July 2020: Unsupervised Quality Estimation code released
May 2020: Follow fairseq on Twitter
April 2020: Monotonic Multihead Attention code released
April 2020: Quant-Noise code released
April 2020: Initial model parallel support and 11B parameters unidirectional LM released
March 2020: Byte-level BPE code released
February 2020: mBART model and code released
February 2020: Added tutorial for back-translation
December 2019: fairseq 0.9.0 released
November 2019: VizSeq released (a visual analysis toolkit for evaluating fairseq models)
November 2019: CamemBERT model and code released
November 2019: BART model and code released
November 2019: XLM-R models and code released
September 2019: Nonautoregressive translation code released
August 2019: WMT'19 models released
July 2019: fairseq relicensed under MIT license
July 2019: RoBERTa models and code released
June 2019: wav2vec models and code released

Features:

multi-GPU training on one machine or across multiple machines (data and model parallel)
fast generation on both CPU and GPU with multiple search algorithms implemented:
- beam search
- Diverse Beam Search (Vijayakumar et al., 2016)
- sampling (unconstrained, top-k and top-p/nucleus)
- lexically constrained decoding (Post & Vilar, 2018)
gradient accumulation enables training with large mini-batches even on a single GPU
mixed precision training (trains faster with less GPU memory on NVIDIA tensor cores)
extensible: easily register new models, criterions, tasks, optimizers and learning rate schedulers
flexible configuration based on Hydra allowing a combination of code, command-line and file based configuration
full parameter and optimizer state sharding
offloading parameters to CPU

We also provide pre-trained models for translation and language modeling with a convenient torch.hub interface:

en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de.single_model')
en2de.translate('Hello world', beam=5)
# 'Hallo Welt'

See the PyTorch Hub tutorials for translation and RoBERTa for more examples.

Requirements and Installation

PyTorch version >= 1.5.0
Python version >= 3.6
For training new models, you'll also need an NVIDIA GPU and NCCL
To install fairseq and develop locally:

git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

# to install the latest stable release (0.10.x)
# pip install fairseq

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

For large datasets install PyArrow: pip install pyarrow
If you use Docker make sure to increase the shared memory size either with --ipc=host or --shm-size as command line options to nvidia-docker run .

Getting Started

The full documentation contains instructions for getting started, training new models and extending fairseq with new model types and tasks.

Pre-trained models and examples

We provide pre-trained models and pre-processed, binarized test sets for several tasks listed below, as well as example training and evaluation commands.

Translation: convolutional and transformer models are available
Language Modeling: convolutional and transformer models are available

We also have more detailed READMEs to reproduce results from specific papers:

Join the fairseq community

Twitter: https://twitter.com/fairseq
Facebook page: https://www.facebook.com/groups/fairseq.users
Google group: https://groups.google.com/forum/#!forum/fairseq-users

License

fairseq(-py) is MIT-licensed. The license applies to the pre-trained models as well.

Citation

Please cite as:

@inproceedings{ott2019fairseq,
  title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
  author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
  booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
  year = {2019},
}

Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

514 Nov 17, 2022

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

43 Dec 23, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Dec 30, 2022

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 3, 2023

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

10.8k Feb 18, 2021

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

* MY SOCIAL MEDIA : Programming And Memes Want to contact Mr. Error ? CONTACT : errora[email protected] Install script on Termux $ apt update && apt upgra

9 Jun 17, 2021

A Facebook Messenger Chatbot using NLP

A Facebook Messenger Chatbot using NLP This project is about creating a messenger chatbot using basic NLP techniques and models like Logistic Regressi

6 Nov 20, 2022

An open-source NLP research library, built on PyTorch.

An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks. Quic

11.4k Jan 1, 2023

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

Releases(v0.10.2)

v0.10.2(Jan 5, 2021)
Bug fixes:

fix register_model_architecture for Transformer language model (#3097)

fix logging to use stdout instead of stderr (#3052)

Source code(tar.gz)
Source code(zip)
v0.10.1(Nov 21, 2020)

This minor release includes fixes for torch.distributed.launch, --user-dir and a few smaller bugs. We also include prebuilt wheels for common platforms.
Source code(tar.gz)
Source code(zip)
fairseq-0.10.1-cp36-cp36m-macosx_10_9_x86_64.whl(1.08 MB)
fairseq-0.10.1-cp36-cp36m-manylinux1_x86_64.whl(1.61 MB)
fairseq-0.10.1-cp37-cp37m-macosx_10_9_x86_64.whl(1.07 MB)
fairseq-0.10.1-cp37-cp37m-manylinux1_x86_64.whl(1.61 MB)
fairseq-0.10.1-cp38-cp38-macosx_10_9_x86_64.whl(1.07 MB)
fairseq-0.10.1-cp38-cp38-manylinux1_x86_64.whl(1.61 MB)
v0.10.0(Nov 12, 2020)
It's been a long time since our last release (0.9.0) nearly a year ago! There have been numerous changes and new features added since then, which we've tried to summarize below. While this release carries the same major version as our previous release (0.x.x), if you have code that relies on 0.9.0, it is likely you'll need to adapt it before updating to 0.10.0.

Looking forward, this will also be the last significant release with the 0.x.x numbering. The next release will be 1.0.0 and will include a major migration to the Hydra configuration system, with an eye towards modularizing fairseq to be more usable as a library.

Changelog:

New papers:

Reducing Transformer Depth on Demand with Structured Dropout (Fan et al., 2019)

MBART: Multilingual Denoising Pre-training for Neural Machine Translation ({Liu*,Gu*,Goyal*} et al., 2020)

Neural Machine Translation with Byte-Level Subwords (Wang et al., 2019)

Training with Quantization Noise for Extreme Model Compression ({Fan*,Stock*} et al., 2019)

Monotonic Multihead Attention (Ma et al., 2020)

Unsupervised Quality Estimation for Neural Machine Translation (Fomicheva et al., 2020)

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations (Baevski et al., 2020)

Lexically constrained decoding with dynamic beam allocation

Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models (Enarvi et al., 2020)

Linformer: Self-Attention with Linear Complexity (Wang et al., 2020)

Cross-lingual Retrieval for Iterative Self-Supervised Training (Tran et al., 2020)

Deep Transformers with Latent Depth (Li et al., 2020)

Better Fine-Tuning by Reducing Representational Collapse (Aghajanyan et al. 2020)

Major new features:

TorchScript support for Transformer and SequenceGenerator (PyTorch 1.6+ only)

Model parallel training support (see Megatron-11b)

TPU support via --tpu and --bf16 options (775122950d145382146e9120308432a9faf9a9b8)

Added VizSeq (a visual analysis toolkit for evaluating fairseq models)

Migrated to Python logging (fb76dac1c4e314db75f9d7a03cb4871c532000cb)

Added “SlowMo” distributed training backend (0dac0ff3b1d18db4b6bb01eb0ea2822118c9dd13)

Added Optimizer State Sharding (ZeRO) (5d7ed6ab4f92d20ad10f8f792b8703e260a938ac)

Added several features to improve speech recognition support in fairseq: CTC criterion, external ASR decoder support (currently only wav2letter decoder) with KenLM and fairseq language model fusion

Minor features:

Added --patience for early stopping

Added --shorten-method=[none|truncate|random_crop] to language modeling (and other) tasks

Added --eval-bleu for computing BLEU scores during training (60fbf64f302a825eee77637a0b7de54fde38fb2c)

Added support for training huggingface models (e.g. hf_gpt2) (2728f9b06d9a3808cc7ebc2afa1401eddef35e35)

Added FusedLAMB optimizer (--optimizer=lamb) (f75411af2690a54a5155871f3cf7ca1f6fa15391)

Added LSTM-based language model (lstm_lm) (9f4256edf60554afbcaadfa114525978c141f2bd)

Added dummy tasks and models for benchmarking (91f05347906e80e6705c141d4c9eb7398969a709; a541b19d853cf4a5209d3b8f77d5d1261554a1d9)

Added tutorial and pretrained models for paraphrasing (630701eaa750efda4f7aeb1a6d693eb5e690cab1)

Support quantization for Transformer (6379573c9e56620b6b4ddeb114b030a0568ce7fe)

Support multi-GPU validation in fairseq-validate (2f7e3f33235b787de2e34123d25f659e34a21558)

Support batched inference in hub interface (3b53962cd7a42d08bcc7c07f4f858b55bf9bbdad)

Support for language model fusion in standard beam search (5379461e613263911050a860b79accdf4d75fd37)

Breaking changes:

Updated requirements to Python 3.6+ and PyTorch 1.5+

--max-sentences renamed to --batch-size

Main entry point scripts (eval_lm.py, generate.py, etc.) removed from root directory into fairseq_cli

Changed format for generation output; H- now corresponds to tokenized system outputs and newly added D- lines correspond to detokenized outputs (f353913420b6ef8a31ecc55d2ec0c988178698e0)

We now log the stats from the log-interval (displayed as train_inner) instead of a rolling average over each epoch.

SequenceGenerator/Scorer does not print alignment by default, re-enable with --print-alignment

Print base 2 scores in generation scripts (660d69fd2bdc4c3468df7eb26b3bbd293c793f94)

Incremental decoding interface changed to use FairseqIncrementalState (4e48c4ae5da48a5f70c969c16793e55e12db3c81; 88185fcc3f32bd24f65875bd841166daa66ed301)

Refactor namespaces in Criterions to support library usage (introduce LegacyFairseqCriterion for BC) (46b773a393c423f653887c382e4d55e69627454d)

Deprecate FairseqCriterion::aggregate_logging_outputs interface, use FairseqCriterion::reduce_metrics instead (86793391e38bf88c119699bfb1993cb0a7a33968)

Moved fairseq.meters to fairseq.logging.meters and added new metrics aggregation module (fairseq.logging.metrics) (1e324a5bbe4b1f68f9dadf3592dab58a54a800a8; f8b795f427a39c19a6b7245be240680617156948)

Reset mid-epoch stats every log-interval steps (244835d811c2c66b1de2c5e86532bac41b154c1a)

Ignore duplicate entries in dictionary files (dict.txt) and support manual overwrite with #fairseq:overwrite option (dd1298e15fdbfc0c3639906eee9934968d63fc29; 937535dba036dc3759a5334ab5b8110febbe8e6e)

Use 1-based indexing for epochs everywhere (aa79bb9c37b27e3f84e7a4e182175d3b50a79041)

Minor interface changes:

Added FairseqTask::begin_epoch hook (122fc1db49534a5ca295fcae1b362bbd6308c32f)

FairseqTask::build_generator interface changed (cd2555a429b5f17bc47260ac1aa61068d9a43db8)

Change RobertaModel base class to FairseqEncoder (307df5604131dc2b93cc0a08f7c98adbfae9d268)

Expose FairseqOptimizer.param_groups property (8340b2d78f2b40bc365862b24477a0190ad2e2c2)

Deprecate --fast-stat-sync and replace with FairseqCriterion::logging_outputs_can_be_summed interface (fe6c2edad0c1f9130847b9a19fbbef169529b500)

--raw-text and --lazy-load are fully deprecated; use --dataset-impl instead

Mixture of expert tasks moved to examples/ (8845dcf5ff43ca4d3e733ade62ceca52f1f1d634)

Performance improvements:

Use cross entropy from apex for improved memory efficiency (5065077dfc1ec4da5246a6103858641bfe3c39eb)

Added buffered dataloading (--data-buffer-size) (411531734df8c7294e82c68e9d42177382f362ef)

Source code(tar.gz)
Source code(zip)
v0.9.0(Dec 4, 2019)
Possibly breaking changes:

Set global numpy seed (4a7cd58)

Split in_proj_weight into separate k, v, q projections in MultiheadAttention (fdf4c3e)

TransformerEncoder returns namedtuples instead of dict (27568a7)

New features:

Add --fast-stat-sync option (e1ba32a)

Add --empty-cache-freq option (315c463)

Support criterions with parameters (ba5f829)

New papers:

Simple and Effective Noisy Channel Modeling for Neural Machine Translation (49177c9)

Levenshtein Transformer (86857a5, ...)

Cross+Self-Attention for Transformer Models (4ac2c5f)

Jointly Learning to Align and Translate with Transformer Models (1c66792)

Reducing Transformer Depth on Demand with Structured Dropout (dabbef4)

Unsupervised Cross-lingual Representation Learning at Scale (XLM-RoBERTa) (e23e5ea)

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (a92bcda)

CamemBERT: a French BERT (b31849a)

Speed improvements:

Add CUDA kernels for LightConv and DynamicConv (f840564)

Cythonization of various dataloading components (4fc3953, ...)

Don't project mask tokens for MLM training (718677e)

Source code(tar.gz)
Source code(zip)
v0.8.0(Aug 14, 2019)
Changelog:

Relicensed under MIT license

Add RoBERTa

Add wav2vec

Add WMT'19 models

Add initial ASR code

Changed torch.hub interface (generate renamed to translate)

Add --tokenizer and --bpe

f812e52: Renamed data.transforms -> data.encoders

654affc: New Dataset API (optional)

47fd985: Deprecate old Masked LM components

5f78106: Set mmap as default dataset format and infer format automatically

Misc fixes for sampling

Misc fixes to support PyTorch 1.2

Source code(tar.gz)
Source code(zip)
v0.7.2(Jul 19, 2019)

No major API changes since the last release. Cutting a new release since we'll be merging significant (possibly breaking) changes to logging, data loading and the masked LM implementation soon.
Source code(tar.gz)
Source code(zip)
v0.7.1(Jun 20, 2019)
Changelog:

9462a81: Enhanced MMapIndexedDataset: less memory, higher speed

392fce8: Add code for wav2vec paper

Source code(tar.gz)
Source code(zip)
v0.7.0(Jun 19, 2019)
Notable (possibly breaking) changes:

d45db80: Remove checkpoint utility functions from utils.py into checkpoint_utils.py

f2563c2: Move LM definitions into separate files

dffb167: Updates to model API:

FairseqModel -> FairseqEncoderDecoderModel

add FairseqDecoder.extract_features and FairseqDecoder.output_layer

encoder_out_dict -> encoder_out

rm unused remove_head functions

34726d5: Move distributed_init into DistributedFairseqModel

cf17068: Simplify distributed launch by automatically launching multiprocessing on each node for all visible GPUs (allows launching just one job per node instead of one per GPU)

d45db80: Change default LR scheduler from reduce_lr_on_plateau to fixed

96ac28d: Rename --sampling-temperature -> --temperature

fc1a19a: Deprecate dummy batches

a1c997b: Add memory mapped datasets

0add50c: Allow cycling over multiple datasets, where each one becomes an "epoch"

Plus many additional features and bugfixes
Source code(tar.gz)
Source code(zip)
v0.6.2(Mar 15, 2019)
Changelog:

998ba4f: Add language models from Baevski & Auli (2018)

4294c4f: Add mixture of experts code from Shen et al. (2019)

0049349: Add example for multilingual training

48d9afb: Speed improvements, including fused operators from apex

44d27e6: Add Tensorboard support

d17fa85: Add Adadelta optimizer

9e1c880: Add FairseqEncoderModel

b65c579: Add FairseqTask.inference_step to modularize generate.py

2ad1178: Add back --curriculum

Misc bug fixes and other features

Source code(tar.gz)
Source code(zip)
v0.6.1(Feb 9, 2019)

Bumping version number for PyPI release.
Source code(tar.gz)
Source code(zip)
v0.6.0(Sep 26, 2018)
Changelog:

4908863: Switch to DistributedDataParallelC10d and bump version 0.5.0 -> 0.6.0

no more FP16Trainer, we just have an FP16Optimizer wrapper

most of the distributed code is moved to a new wrapper class called DistributedFairseqModel, which behaves like DistributedDataParallel and a FairseqModel at the same time

Trainer now requires an extra dummy_batch argument at initialization, which we do fwd/bwd on when there's an uneven number of batches per worker. We hide the gradients from these dummy batches by multiplying the loss by 0

Trainer.train_step now takes a list of samples, which will allow cleaner --update-freq

1c56b58: parallelize preprocessing

Misc bug fixes and features

Source code(tar.gz)
Source code(zip)
v0.5.0(Jun 15, 2018)

Source code(tar.gz)
Source code(zip)
v0.4.0(Jun 15, 2018)

Source code(tar.gz)
Source code(zip)

Owner

GitHub Repository

Translation to python of Chris Sims' optimization function

pycsminwel This is a locol minimization algorithm. Uses a quasi-Newton method with BFGS update of the estimated inverse hessian. It is robust against

1 Mar 21, 2022

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

SentAugment SentAugment is a data augmentation technique for semi-supervised learning in NLP. It uses state-of-the-art sentence embeddings to structur

363 Dec 30, 2022

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

39 Nov 15, 2022

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

2 Feb 22, 2022

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

9 Jan 08, 2023

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Text to speech (using Python) Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and co

19 Jun 30, 2022

Script and models for clustering LAION-400m CLIP embeddings.

clustering-laion400m Script and models for clustering LAION-400m CLIP embeddings. Models were fit on the first million or so image embeddings. A subje

22 Oct 04, 2022

MPNet: Masked and Permuted Pre-training for Language Understanding

MPNet MPNet: Masked and Permuted Pre-training for Language Understanding, by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu, is a novel pre-tr

228 Nov 21, 2022

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

Seq2seq_attn Use the Seq2Seq method to implement machine translation and use the

1 Jun 28, 2022

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Related tags

Overview

What's New:

Features:

Requirements and Installation

Getting Started

Pre-trained models and examples

Join the fairseq community

License

Citation

You might also like...

Sequence-to-Sequence learning using PyTorch

Code for the paper: Sequence-to-Sequence Learning with Latent Neural Grammars

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

FB ID CLONER WUTHOT CHECKPOINT, FACEBOOK ID CLONE FROM FILE

A Facebook Messenger Chatbot using NLP

An open-source NLP research library, built on PyTorch.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Releases(v0.10.2)

v0.10.2(Jan 5, 2021)

v0.10.1(Nov 21, 2020)

v0.10.0(Nov 12, 2020)

Changelog:

New papers:

Major new features:

Minor features:

Breaking changes:

Minor interface changes:

Performance improvements:

v0.9.0(Dec 4, 2019)

v0.8.0(Aug 14, 2019)

v0.7.2(Jul 19, 2019)

v0.7.1(Jun 20, 2019)

v0.7.0(Jun 19, 2019)

v0.6.2(Mar 15, 2019)

v0.6.1(Feb 9, 2019)

v0.6.0(Sep 26, 2018)

v0.5.0(Jun 15, 2018)

v0.4.0(Jun 15, 2018)

Owner

Translation to python of Chris Sims' optimization function

SentAugment is a data augmentation technique for semi-supervised learning in NLP.

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Text to speech is a process to convert any text into voice. Text to speech project takes words on digital devices and convert them into audio. Here I have used Google-text-to-speech library popularly known as gTTS library to convert text file to .mp3 file. Hope you like my project!

Script and models for clustering LAION-400m CLIP embeddings.

MPNet: Masked and Permuted Pre-training for Language Understanding

Seq2seq attn - Use the Seq2Seq method to implement machine translation and introduce Attention mechanism to improve the results

An end to end ASR Transformer model training repo

基于“Seq2Seq+前缀树”的知识图谱问答

An open source library for deep learning end-to-end dialog systems and chatbots.

Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

nlp-tutorial is a tutorial for who is studying NLP(Natural Language Processing) using Pytorch

A simple chatbot based on chatterbot that you can use for anything has basic features

ReCoin - Restoring our environment and businesses in parallel

Python api wrapper for JellyFish Lights

Extract city and country mentions from Text like GeoText without regex, but FlashText, a Aho-Corasick implementation.