The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Last update: Oct 28, 2022

Related tags

Overview

VAENAR-TTS

This repo contains code accompanying the paper "VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis".

Samples | Paper | Pretrained Models

Usage

0. Dataset

English: LJSpeech
Mandarin: DataBaker(标贝)

1. Environment setup

conda env create -f environment.yml
conda activate vaenartts-env

2. Data pre-processing

For English using LJSpeech:

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset ljspeech --data_dir /path/to/extracted/LJSpeech-1.1 --save_dir ./ljspeech

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES= python preprocess.py --dataset databaker --data_dir /path/to/extracted/biaobei --save_dir ./databaker

3. Training

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset ljspeech --log_dir ./lj-log_dir --test_dir ./lj-test_dir --data_dir ./ljspeech/tfrecords/ --model_dir ./lj-model_dir

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python train.py --dataset databaker --log_dir ./db-log_dir --test_dir ./db-test_dir --data_dir ./databaker/tfrecords/ --model_dir ./db-model_dir

4. Inference (synthesize speech for the whole test set)

For English using LJSpeech:

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset ljspeech --test_dir ./lj-test-2000 --data_dir ./ljspeech/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./lj-model_dir/ckpt-2000

For Mandarin using Databaker(标贝):

CUDA_VISIBLE_DEVICES=0 TF_FORCE_GPU_ALLOW_GROWTH=true python inference.py --dataset databaker --test_dir ./db-test-2000 --data_dir ./databaker/tfrecords/ --batch_size 16 --write_wavs true --draw_alignments true --ckpt_path ./db-model_dir/ckpt-2000

The official implementation of VAENAR-TTS, a VAE based non-autoregressive TTS model.

Related tags

Overview

VAENAR-TTS

Samples | Paper | Pretrained Models

Usage

0. Dataset

1. Environment setup

2. Data pre-processing

3. Training

4. Inference (synthesize speech for the whole test set)

Reference

Owner

THUHCSI

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Loopy belief propagation for factor graphs on discrete variables, in JAX!

Decorator for PyMC3

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Byzantine-robust decentralized learning via self-centered clipping

DrWhy is the collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models.

Self-Supervised CNN-GCN Autoencoder

Learning Lightweight Low-Light Enhancement Network using Pseudo Well-Exposed Images

Depth-Aware Video Frame Interpolation (CVPR 2019)

Behavioral "black-box" testing for recommender systems

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network)

Reproduction process of AlexNet

Official implementation of "Dynamic Anchor Learning for Arbitrary-Oriented Object Detection" (AAAI2021).

Dataset Condensation with Contrastive Signals

A privacy-focused, intelligent security camera system.

Hyperbolic Procrustes Analysis Using Riemannian Geometry

Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode

Python Single Object Tracking Evaluation

Spatial-Location-Constraint-Prototype-Loss-for-Open-Set-Recognition