The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Last update: Oct 28, 2022

Related tags

Overview

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

English: LJSpeech
Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Related tags

Overview

VAENAR-TTS

Samples | Paper | Pretrained Models

Usage

0. Dataset

1. Environment setup

2. Data pre-processing

3. Training

4. Inference (synthesize speech for the whole test set)

Reference

Owner

THUHCSI

VampiresVsWerewolves - Our Implementation of a MiniMax algorithm with alpha beta pruning in the context of an in-class competition

a test times augmentation toolkit based on paddle2.0.

Code for the paper PermuteFormer

Phomber is infomation grathering tool that reverse search phone numbers and get their details, written in python3.

本插件是pcrjjc插件的重置版，可以独立于后端api运行

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

PyWorld3 is a Python implementation of the World3 model

Natural Language Processing with transformers

COVID-19 Chatbot with Rasa 2.0: open source conversational AI

The guide to tackle with the Text Summarization

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

PyTorch implementation of Tacotron speech synthesis model.

Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

运小筹公众号是致力于分享运筹优化(LP、MIP、NLP、随机规划、鲁棒优化)、凸优化、强化学习等研究领域的内容以及涉及到的算法的代码实现。

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Python wrapper for Stanford CoreNLP tools v3.4.1

Python package for Turkish Language.