Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Last update: Dec 28, 2022

Related tags

Deep Learning ppg-vc

Overview

ppg-vc

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

This repo implements different kinds of PPG-based VC models. Pretrained models. More models are on the way.

Notes:

The PPG model provided in conformer_ppg_model is based on Hybrid CTC-Attention phoneme recognizer, trained with LibriSpeech (960hrs). PPGs have frame-shift of 10 ms, with dimensionality of 144. This modelis very much similar to the one used in this paper.
This repo uses HifiGAN V1 as the vocoder model, sampling rate of synthesized audio is 24kHz.

Highlights

Any-to-many VC
Any-to-Any VC (a.k.a. few/one-shot VC)

How to use

Data preprocessing

Please run 1_compute_ctc_att_bnf.py to compute PPG features.
Please run 2_compute_f0.py to compute fundamental frequency.
Please run 3_compute_spk_dvecs.py to compute speaker d-vectors.

Training

Please refer to run.sh

Conversion

Plesae refer to test.sh

TODO

Upload pretraind models.

Citations

@ARTICLE{liu2021any,
  author={Liu, Songxiang and Cao, Yuewen and Wang, Disong and Wu, Xixin and Liu, Xunying and Meng, Helen},
  journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing}, 
  title={Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling}, 
  year={2021},
  volume={29},
  number={},
  pages={1717-1728},
  doi={10.1109/TASLP.2021.3076867}
}

@inproceedings{Liu2018,
  author={Songxiang Liu and Jinghua Zhong and Lifa Sun and Xixin Wu and Xunying Liu and Helen Meng},
  title={Voice Conversion Across Arbitrary Speakers Based on a Single Target-Speaker Utterance},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={496--500},
  doi={10.21437/Interspeech.2018-1504},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1504}
}

Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

Related tags

Overview

ppg-vc

Highlights

How to use

Data preprocessing

Training

Conversion

TODO

Citations

Owner

Liu Songxiang

OpenL3: Open-source deep audio and image embeddings

TLoL (Python Module) - League of Legends Deep Learning AI (Research and Development)

Python TFLite scripts for detecting objects of any class in an image without knowing their label.

Reinforcement Learning for the Blackjack

ICML 21 - Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

ivadomed is an integrated framework for medical image analysis with deep learning.

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links

Not Suitable for Work (NSFW) classification using deep neural network Caffe models.

Video Instance Segmentation with a Propose-Reduce Paradigm (ICCV 2021)

Pytorch Implementation of rpautrat/SuperPoint

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

Official Pytorch implementation of 'GOCor: Bringing Globally Optimized Correspondence Volumes into Your Neural Network' (NeurIPS 2020)

Style-based Point Generator with Adversarial Rendering for Point Cloud Completion (CVPR 2021)

Implementation for the IJCAI2021 work "Beyond the Spectrum: Detecting Deepfakes via Re-synthesis"

Implementation of "Meta-rPPG: Remote Heart Rate Estimation Using a Transductive Meta-Learner"

Run containerized, rootless applications with podman