ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

ERISHA is a multilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available. The term ERISHA means speech in Sanskrit. The framework of ERISHA includes various deep learning architectures such as Global Style Token (GST), Variational Autoencoder (VAE), and Gaussian Mixture Variational Autoencoder (GMVAE), and X-vectors for building prosody encoder.

Currently, the library is in its initial stage of development and will be updated frequently in the coming days.

Stay tuned for more updates, and we are open to collaboration !!!

Installation and Training

Refer INSTALL for initial setup

Available recipes

Available Features

Resampling of speech waveforms to target sampling rate in recipes
Support to train TTS system for other languages
Support to train Multilingual TTS system for other languages

Upcoming updates

[User Documentation]
Pytorch Lightning
Multiclass N-pair loss
[Cluster sampling for improving latent representation of speaker and expressivity](Proposed work)

Acknowledgements

This implementation uses code from the following repos: NVIDIA, Keith Ito, Prem Seetharaman, Chengqi Deng,Dannynis, Jhosimar George Arias Figueroa

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

Related tags

Overview

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library

Installation and Training

Available recipes

Available Features

Upcoming updates

Acknowledgements

Owner

Ajinkya Kulkarni

Google Recaptcha solver.

Training Structured Neural Networks Through Manifold Identification and Variance Reduction

Learning Intents behind Interactions with Knowledge Graph for Recommendation, WWW2021

Stitch it in Time: GAN-Based Facial Editing of Real Videos

PyMatting: A Python Library for Alpha Matting

Attention-based Transformation from Latent Features to Point Clouds (AAAI 2022)

3DV 2021: Synergy between 3DMM and 3D Landmarks for Accurate 3D Facial Geometry

One line to host them all. Bootstrap your image search case in minutes.

Collective Multi-type Entity Alignment Between Knowledge Graphs (WWW'20)

Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

CLIPImageClassifier wraps clip image model from transformers

⚾🤖⚾ Automatic baseball pitching overlay in realtime

Video Matting via Consistency-Regularized Graph Neural Networks

MoveNetを用いたPythonでの姿勢推定のデモ

Empowering journalists and whistleblowers

Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker

SOTA easy to use PyTorch-based DL training library

OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

Pytorch implementation for the Temporal and Object Quantification Networks (TOQ-Nets).

Implements VQGAN+CLIP for image and video generation, and style transfers, based on text and image prompts. Emphasis on ease-of-use, documentation, and smooth video creation.