Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

Overview

MaskCycleGAN-VC

Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion.

MaskCycleGAN-VC is the state of the art method for non-parallel voice conversion using CycleGAN. It is trained using a novel auxiliary task of filling in frames (FIF) by applying a temporal mask to the input Mel-spectrogram. It demonstrates marked improvements over prior models such as CycleGAN-VC (2018), CycleGAN-VC2 (2019), and CycleGAN-VC3 (2020).


Figure1: MaskCycleGAN-VC Training




Figure2: MaskCycleGAN-VC Generator Architecture




Figure3: MaskCycleGAN-VC PatchGAN Discriminator Architecture



Paper: https://arxiv.org/pdf/2102.12841.pdf

Repository Contributors: Claire Pajot, Hikaru Hotta, Sofian Zalouk

Setup

Clone the repository.

git clone [email protected]:GANtastic3/MaskCycleGAN-VC.git
cd MaskCycleGAN-VC

Create the conda environment.

conda env create -f environment.yml
conda activate MaskCycleGAN-VC

VCC2018 Dataset

The authors of the paper used the dataset from the Spoke task of Voice Conversion Challenge 2018 (VCC2018). This is a dataset of non-parallel utterances from 6 male and 6 female speakers. Each speaker utters approximately 80 sentences.

Download the dataset from the command line.

wget --no-check-certificate https://datashare.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_training.zip?sequence=2&isAllowed=y
wget --no-check-certificate https://datashare.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_evaluation.zip?sequence=3&isAllowed=y
wget --no-check-certificate https://datashare.ed.ac.uk/bitstream/handle/10283/3061/vcc2018_database_reference.zip?sequence=5&isAllowed=y

Unzip the dataset file.

mkdir vcc2018
apt-get install unzip
unzip vcc2018_database_training.zip?sequence=2 -d vcc2018/
unzip vcc2018_database_evaluation.zip?sequence=3 -d vcc2018/
unzip vcc2018_database_reference.zip?sequence=5 -d vcc2018/
mv -v vcc2018/vcc2018_reference/* vcc2018/vcc2018_evaluation
rm -rf vcc2018/vcc2018_reference

Data Preprocessing

To expedite training, we preprocess the dataset by converting waveforms to melspectograms, then save the spectrograms as pickle files normalized.pickle and normalization statistics (mean, std) as npz files _norm_stats.npz. We convert waveforms to spectrograms using a melgan vocoder to ensure that you can decode voice converted spectrograms to waveform and listen to your samples during inference.

python data_preprocessing/preprocess_vcc2018.py \
  --data_directory vcc2018/vcc2018_training \
  --preprocessed_data_directory vcc2018_preprocessed/vcc2018_training \
  --speaker_ids VCC2SF1 VCC2SF2 VCC2SF3 VCC2SF4 VCC2SM1 VCC2SM2 VCC2SM3 VCC2SM4 VCC2TF1 VCC2TF2 VCC2TM1 VCC2TM2
python data_preprocessing/preprocess_vcc2018.py \
  --data_directory vcc2018/vcc2018_evaluation \
  --preprocessed_data_directory vcc2018_preprocessed/vcc2018_evaluation \
  --speaker_ids VCC2SF1 VCC2SF2 VCC2SF3 VCC2SF4 VCC2SM1 VCC2SM2 VCC2SM3 VCC2SM4 VCC2TF1 VCC2TF2 VCC2TM1 VCC2TM2

Training

Train MaskCycleGAN-VC to convert between and . You should start to get excellent results after only several hundred epochs.

python -W ignore::UserWarning -m mask_cyclegan_vc.train \
    --name mask_cyclegan_vc__ \
    --seed 0 \
    --save_dir results/ \
    --preprocessed_data_dir vcc2018_preprocessed/vcc2018_training/ \
    --speaker_A_id  \
    --speaker_B_id  \
    --epochs_per_save 100 \
    --epochs_per_plot 10 \
    --num_epochs 6172 \
    --batch_size 1 \
    --lr 5e-4 \
    --decay_after 1e4 \
    --sample_rate 22050 \
    --num_frames 64 \
    --max_mask_len 25 \
    --gpu_ids 0 \

To continue training from a previous checkpoint in the case that training is suspended, add the argument --continue_train while keeping all others the same. The model saver class will automatically load the most recently saved checkpoint and resume training.

Launch Tensorboard in a separate terminal window.

tensorboard --logdir results/logs

Testing

Test your trained MaskCycleGAN-VC by converting between and on the evaluation dataset. Your converted .wav files are stored in results//converted_audio.

python -W ignore::UserWarning -m mask_cyclegan_vc.test \
    --name mask_cyclegan_vc_VCC2SF3_VCC2TF1 \
    --save_dir results/ \
    --preprocessed_data_dir vcc2018_preprocessed/vcc2018_evaluation \
    --gpu_ids 0 \
    --speaker_A_id VCC2SF3 \
    --speaker_B_id VCC2TF1 \
    --ckpt_dir /data1/cycleGAN_VC3/mask_cyclegan_vc_VCC2SF3_VCC2TF1/ckpts \
    --load_epoch 500 \
    --model_name generator_A2B \

Toggle between A->B and B->A conversion by setting --model_name as either generator_A2B or generator_B2A.

Select the epoch to load your model from by setting --load_epoch.

Code Organization

├── README.md                       <- Top-level README.
├── environment.yml                 <- Conda environment
├── .gitignore
├── LICENSE
|
├── args
│   ├── base_arg_parser             <- arg parser
│   ├── train_arg_parser            <- arg parser for training (inherits base_arg_parser)
│   ├── cycleGAN_train_arg_parser   <- arg parser for training MaskCycleGAN-VC (inherits train_arg_parser)
│   ├── cycleGAN_test_arg_parser    <- arg parser for testing MaskCycleGAN-VC (inherits base_arg_parser)
│
├── bash_scripts
│   ├── mask_cyclegan_train.sh      <- sample script to train MaskCycleGAN-VC
│   ├── mask_cyclegan_test.sh       <- sample script to test MaskCycleGAN-VC
│
├── data_preprocessing
│   ├── preprocess_vcc2018.py       <- preprocess VCC2018 dataset
│
├── dataset
│   ├── vc_dataset.py               <- torch dataset class for MaskCycleGAN-VC
│
├── logger
│   ├── base_logger.sh              <- logging to Tensorboard
│   ├── train_logger.sh             <- logging to Tensorboard during training (inherits base_logger)
│
├── saver
│   ├── model_saver.py              <- saves and loads models
│
├── mask_cyclegan_vc
│   ├── model.py                    <- defines MaskCycleGAN-VC model architecture
│   ├── train.py                    <- training script for MaskCycleGAN-VC
│   ├── test.py                     <- training script for MaskCycleGAN-VC
│   ├── utils.py                    <- utility functions to train and test MaskCycleGAN-VC

Acknowledgements

This repository was inspired by jackaduma's implementation of CycleGAN-VC2.

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Code and data to accompany the camera-ready version of "Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation" in EMNLP 2021

Mozhdeh Gheini 16 Jul 16, 2022
OpenCV, MediaPipe Pose Estimation, Affine Transform for Icon Overlay

Yoga Pose Identification and Icon Matching Project Goal Detect yoga poses performed by a user and overlay a corresponding icon image. Running the main

Anna Garverick 1 Dec 03, 2021
Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant.

Marvis v1.0 Marvis is Mastouri's Jarvis version of the AI-powered Python personal assistant. About M.A.R.V.I.S. J.A.R.V.I.S. is a fictional character

Reda Mastouri 1 Dec 29, 2021
GeDML is an easy-to-use generalized deep metric learning library

GeDML is an easy-to-use generalized deep metric learning library

Borui Zhang 32 Dec 05, 2022
YKKDetector For Python

YKKDetector OpenCVを利用した機械学習データをもとに、VRChatのスクリーンショットなどからYKKさん(もとい「幽狐族のお姉様」)を検出できるソフトウェアです。 マニュアル こちらから実行環境のセットアップから解説する詳細なマニュアルをご覧いただけます。 ライセンス 本ソフトウェア

あんふぃとらいと 5 Dec 07, 2021
An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

Self-Supervised Graph Representation Learning via Topology Transformations This repository is the official PyTorch implementation of the following pap

Hsiang Gao 2 Oct 31, 2022
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a

Maurice Günder 0 Apr 22, 2022
Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization".

SAPE Project page Paper Official implementation for the paper "SAPE: Spatially-Adaptive Progressive Encoding for Neural Optimization". Environment Cre

36 Dec 09, 2022
Alignment Attention Fusion framework for Few-Shot Object Detection

AAF framework Framework generalities This repository contains the code of the AAF framework proposed in this paper. The main idea behind this work is

Pierre Le Jeune 20 Dec 16, 2022
Out-of-distribution detection using the pNML regret. NeurIPS2021

OOD Detection Load conda environment conda env create -f environment.yml or install requirements: while read requirement; do conda install --yes $requ

Koby Bibas 23 Dec 02, 2022
Finite Element Analysis

FElupe - Finite Element Analysis FElupe is a Python 3.6+ finite element analysis package focussing on the formulation and numerical solution of nonlin

Andreas D. 20 Jan 09, 2023
A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

Tong Zhu(朱桐) 159 Dec 22, 2022
Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

Instance-wise Occlusion and Depth Orders in Natural Scenes Official source code. Appears at CVPR 2022 This repository provides a new dataset, named In

27 Dec 27, 2022
Official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Imbalance Classification"

DPGNN This repository is an official PyTorch(Geometric) implementation of DPGNN(DPGCN) in "Distance-wise Prototypical Graph Neural Network for Node Im

Yu Wang (Jack) 18 Oct 12, 2022
Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches

CoMIR: Contrastive Multimodal Image Representation for Registration Framework 🖼 Registration of images in different modalities with Deep Learning 🤖

Methods for Image Data Analysis - MIDA 55 Dec 09, 2022
PyTorch implementation of Densely Connected Time Delay Neural Network

Densely Connected Time Delay Neural Network PyTorch implementation of Densely Connected Time Delay Neural Network (D-TDNN) in our paper "Densely Conne

Ya-Qi Yu 64 Oct 11, 2022
Namish Khanna 40 Oct 11, 2022
Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Yan Yuanmeng 478 Dec 25, 2022
Differentiable Simulation of Soft Multi-body Systems

Differentiable Simulation of Soft Multi-body Systems Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, Ming C. Lin [Paper] [Code] Updates The C++ backend s

YilingQiao 26 Dec 23, 2022
Offline Multi-Agent Reinforcement Learning Implementations: Solving Overcooked Game with Data-Driven Method

Overcooked-AI We suppose to apply traditional offline reinforcement learning technique to multi-agent algorithm. In this repository, we implemented be

Baek In-Chang 14 Sep 16, 2022