This code is an implementation for Singing TTS.

Overview

MLP Singer

This code is an implementation for Singing TTS. The algorithm is based on the following papers:

Tae, J., Kim, H., & Lee, Y. (2021). MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis. arXiv preprint arXiv:2106.07886.
Tolstikhin, I., Houlsby, N., Kolesnikov, A., Beyer, L., Zhai, X., Unterthiner, T., ... & Dosovitskiy, A. (2021). Mlp-mixer: An all-mlp architecture for vision. arXiv preprint arXiv:2105.01601.

Structure

  • Structure is based on the MLP Singer.
  • I changed several hyper parameters and data type
    • One of mel or spectrogram is can be selected as a feature type.
    • Token type is changed from phoneme to grapheme.

Used dataset

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in Hyper_Parameters.yaml according to your environment.

  • Sound

    • Setting basic sound parameters.
  • Tokens

    • The number of Lyric token.
  • Max_Note

    • The highest note value for embedding.
  • Duration

    • Min duration is used at pattern generating only.
    • Max duration is decided the maximum time step of model. MLP mixer always use the maximum time step.
    • Equality set the strategy about syllable to grapheme.
      • When True, onset, nucleus, and coda have same length or ±1 difference.
      • When False, onset and coda have Consonant_Duration length, and nucleus has duration - 2 * Consonant_Duration.
  • Feature_Type

    • Setting the feature type (Mel or Spectrogram).
  • Encoder

    • Setting the encoder(embedding).
  • Mixer

    • Setting the MLP mixer.
  • Train

    • Setting the parameters of training.
  • Inference_Batch_Size

    • Setting the batch size when inference
  • Inference_Path

    • Setting the inference path
  • Checkpoint_Path

    • Setting the checkpoint path
  • Log_Path

    • Setting the tensorboard log path
  • Use_Mixed_Precision

    • Setting using mixed precision
  • Use_Multi_GPU

    • Setting using multi gpu
    • By the nvcc problem, Only linux supports this option.
    • If this is True, device parameter is also multiple like '0,1,2,3'.
    • And you have to change the training command also: please check multi_gpu.sh.
  • Device

    • Setting which GPU devices are used in multi-GPU enviornment.
    • Or, if using only CPU, please set '-1'. (But, I don't recommend while training.)

Generate pattern

  • Current version does not support any open source dataset.

Inference file path while training for verification.

  • Inference_for_Training
    • There are three examples for inference.
    • It is midi file based script.

Run

Command

Single GPU

python Train.py -hp  -s 
  • -hp

    • The hyper paramter file path
    • This is required.
  • -s

    • The resume step parameter.
    • Default is 0.
    • If value is 0, model try to search the latest checkpoint.

Multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=32 python -m torch.distributed.launch --nproc_per_node=8 Train.py --hyper_parameters Hyper_Parameters.yaml --port 54322
Owner
Heejo You
Main focus: Psycholinguistics / Mechine learning / Deep learning
Heejo You
MPViT:Multi-Path Vision Transformer for Dense Prediction

MPViT : Multi-Path Vision Transformer for Dense Prediction This repository inlcu

Youngwan Lee 272 Dec 20, 2022
A Python library for adversarial machine learning focusing on benchmarking adversarial robustness.

ARES This repository contains the code for ARES (Adversarial Robustness Evaluation for Safety), a Python library for adversarial machine learning rese

Tsinghua Machine Learning Group 377 Dec 20, 2022
Cancer-and-Tumor-Detection-Using-Inception-model - In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks, specifically here the Inception model by google.

Cancer-and-Tumor-Detection-Using-Inception-model In this repo i am gonna show you how i did cancer/tumor detection in lungs using deep neural networks

Deepak Nandwani 1 Jan 01, 2022
RNN Predict Street Commercial Vitality

RNN-for-Predicting-Street-Vitality Code and dataset for Predicting the Vitality of Stores along the Street based on Business Type Sequence via Recurre

Zidong LIU 1 Dec 15, 2021
A scanpy extension to analyse single-cell TCR and BCR data.

Scirpy: A Scanpy extension for analyzing single-cell immune-cell receptor sequencing data Scirpy is a scalable python-toolkit to analyse T cell recept

ICBI 145 Jan 03, 2023
Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Kaggle G2Net Gravitational Wave Detection : 2nd place solution

Hiroshechka Y 33 Dec 26, 2022
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022
DeepOBS: A Deep Learning Optimizer Benchmark Suite

DeepOBS - A Deep Learning Optimizer Benchmark Suite DeepOBS is a benchmarking suite that drastically simplifies, automates and improves the evaluation

Aaron Bahde 7 May 12, 2020
Music library streaming app written in Flask & VueJS

djtaytay This is a little toy app made to explore Vue, brush up on my Python, and make a remote music collection accessable through a web interface. I

Ryan Tasson 6 May 27, 2022
Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

Official repository for GCR rerank, a GCN-based reranking method for both image and video re-ID

53 Nov 22, 2022
Pytorch code for our paper "Feedback Network for Image Super-Resolution" (CVPR2019)

Feedback Network for Image Super-Resolution [arXiv] [CVF] [Poster] Update: Our proposed Gated Multiple Feedback Network (GMFN) will appear in BMVC2019

Zhen Li 539 Jan 06, 2023
Code for paper Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting

Decoupled Spatial-Temporal Graph Neural Networks Code for our paper: Decoupled Dynamic Spatial-Temporal Graph Neural Network for Traffic Forecasting.

S22 43 Jan 04, 2023
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

Mohamed Ali Souibgui 74 Jan 07, 2023
Medical image analysis framework merging ANTsPy and deep learning

ANTsPyNet A collection of deep learning architectures and applications ported to the python language and tools for basic medical image processing. Bas

Advanced Normalization Tools Ecosystem 118 Dec 24, 2022
PyTorch implementation of SimSiam: Exploring Simple Siamese Representation Learning

SimSiam: Exploring Simple Siamese Representation Learning This is a PyTorch implementation of the SimSiam paper: @Article{chen2020simsiam, author =

Facebook Research 834 Dec 30, 2022
Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning"

Code for "Solving Graph-based Public Good Games with Tree Search and Imitation Learning" This is the code for the paper Solving Graph-based Public Goo

Victor-Alexandru Darvariu 3 Dec 05, 2022
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
"Learning Free Gait Transition for Quadruped Robots vis Phase-Guided Controller"

PhaseGuidedControl The current version is developed based on the old version of RaiSim series, and possibly requires further modification. It will be

X-Mechanics 12 Oct 21, 2022
Train DeepLab for Semantic Image Segmentation

Train DeepLab for Semantic Image Segmentation Martin Kersner, [email protected]

Martin Kersner 172 Dec 14, 2022
End-to-end Temporal Action Detection with Transformer. [Under review]

TadTR: End-to-end Temporal Action Detection with Transformer By Xiaolong Liu, Qimeng Wang, Yao Hu, Xu Tang, Song Bai, Xiang Bai. This repo holds the c

Xiaolong Liu 105 Dec 25, 2022