PyTorch implementation of Tacotron speech synthesis model.

Last update: Dec 09, 2022

Overview

tacotron_pytorch

PyTorch implementation of Tacotron speech synthesis model.

Inspired from keithito/tacotron. Currently not as much good speech quality as keithito/tacotron can generate, but it seems to be basically working. You can find some generated speech examples trained on LJ Speech Dataset at here.

If you are comfortable working with TensorFlow, I'd recommend you to try https://github.com/keithito/tacotron instead. The reason to rewrite it in PyTorch is that it's easier to debug and extend (multi-speaker architecture, etc) at least to me.

Requirements

PyTorch
TensorFlow (if you want to run the training script. This definitely can be optional, but for now required.)

Installation

git clone --recursive https://github.com/r9y9/tacotron_pytorch
pip install -e . # or python setup.py develop

If you want to run the training script, then you need to install additional dependencies.

pip install -e ".[train]"

Training

The package relis on keithito/tacotron for text processing, audio preprocessing and audio reconstruction (added as a submodule). Please follows the quick start section at https://github.com/keithito/tacotron and prepare your dataset accordingly.

If you have your data prepared, assuming your data is in "~/tacotron/training" (which is the default), then you can train your model by:

python train.py

Alignment, predicted spectrogram, target spectrogram, predicted waveform and checkpoint (model and optimizer states) are saved per 1000 global step in checkpoints directory. Training progress can be monitored by:

tensorboard --logdir=log

Testing model

Open the notebook in notebooks directory and change checkpoint_path to your model.

PyTorch implementation of Tacotron speech synthesis model.

Related tags

Overview

tacotron_pytorch

Requirements

Installation

Training

Testing model

Owner

Ryuichi Yamamoto

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Sleep staging from ECG, assisted with EEG

[CVPR 2022 Oral] MixFormer: End-to-End Tracking with Iterative Mixed Attention

PyTorch Implementation of PIXOR: Real-time 3D Object Detection from Point Clouds

Single-stage Keypoint-based Category-level Object Pose Estimation from an RGB Image

Code for all the Advent of Code'21 challenges mostly written in python

TigerLily: Finding drug interactions in silico with the Graph.

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

You Only 👀 One Sequence

CONditionals for Ordinal Regression and classification in PyTorch

SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

Computational inteligence project on faces in the wild dataset

SeisComP/SeisBench interface to enable deep-learning (re)picking in SeisComP

An end-to-end machine learning library to directly optimize AUC loss

SberSwap Video Swap base on deep learning

Semantic Bottleneck Scene Generation

This repository contains the implementation of the paper Contrastive Instance Association for 4D Panoptic Segmentation using Sequences of 3D LiDAR Scans

Train Scene Graph Generation for Visual Genome and GQA in PyTorch >= 1.2 with improved zero and few-shot generalization.

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss