Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Last update: Dec 07, 2022

Related tags

Deep Learning WadaIN-VC

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

By Shengjie Huang, Yanyan Xu*, Dengfeng Ke*, Mingjie Chen, Thomas Hain.

This repo is the official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Audio samples are available at here.

Dependencies

python 3.6.0
pytorch 1.4.0
pyyaml 5.4.1
numpy 1.19.5
librosa 0.8.0
soundfile 0.10.2
tensorboardX 2.1

Preprocess

What you need to prepare first before running this project and how to prepare them

We use the ParallelWaveGAN as our vocoder, and VCTK as our data set.
If you wanna run our project, please install as the description of ParallelWaveGAN project first.
And then prepare all the mel-spectrogram data as ParallelWaveGAN do.
Prepare the speaker_used.json file by yourself, as ./data/80_train_speaker_used.json and ./data/fine_tune_speaker_used.json show.
Prepare the feats.scp file by runing ./convert_decode/convert_mel/get_scp.py .

Assume that your prepared mel-spectrograms are sorted in the files tree like:

├── p225
│   ├── p225_001-feats.npy
│   ├── p225_004-feats.npy
│   ├── p225_005-feats.npy
│   ......
├── p226
│   ├── p226_001-feats.npy
│   ├── p226_003-feats.npy
│   ├── p226_004-feats.npy
│   ......
├── p227
│   ......
├── p228
│   ......
│   ...
│   ...

Training

Run the pretrain stage by bash run_main.sh. We use 80 speakers of VCTK data set, and all utterances for each person.

Fine Tuning

Run the fine tune stage by bash run_fine_tune.sh. We use the other 10 speakers of VCTK data set, and only 1 utterance for each person used.

Inference

$ cd convert_decode/convert_mel
$ bash run_convert.sh

We generate one-shot voice conversion utterances between the 10 one-shot speakers , and use their other unseen utterances to perform one-shot voice conversion!

Official implementation of "One-Shot Voice Conversion with Weight Adaptive Instance Normalization".

Related tags

Overview

One-Shot Voice Conversion with Weight Adaptive Instance Normalization

Dependencies

Preprocess

What you need to prepare first before running this project and how to prepare them

Assume that your prepared mel-spectrograms are sorted in the files tree like:

Training

Fine Tuning

Inference

Owner

Lighting the Darkness in the Deep Learning Era: A Survey, An Online Platform, A New Dataset

CVPR2020 Counterfactual Samples Synthesizing for Robust VQA

Yolov5+SlowFast: Realtime Action Detection Based on PytorchVideo

AFL binary instrumentation

An auto discord account and token generator. Automatically verifies the phone number. Works without proxy. Bypasses captcha.

DeepStochlog Package For Python

Interpretation of T cell states using reference single-cell atlases

PyTorch implementation of MulMON

Machine Learning Privacy Meter: A tool to quantify the privacy risks of machine learning models with respect to inference attacks, notably membership inference attacks

YolactEdge: Real-time Instance Segmentation on the Edge

Advantage Actor Critic (A2C): jax + flax implementation

Open-AI's DALL-E for large scale training in mesh-tensorflow.

High-Resolution Image Synthesis with Latent Diffusion Models

CVAT is free, online, interactive video and image annotation tool for computer vision

A Real-ESRGAN equipped Colab notebook for CLIP Guided Diffusion

An implementation of a sequence to sequence neural network using an encoder-decoder

Generate fine-tuning samples & Fine-tuning the model & Generate samples by transferring Note On

Code for the paper "Combining Textual Features for the Detection of Hateful and Offensive Language"

Neural Point-Based Graphics

Official PyTorch implementation of the paper "Self-Supervised Relational Reasoning for Representation Learning", NeurIPS 2020 Spotlight.