An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

Last update: Jun 16, 2022

Related tags

Computer Vision AutoVC

Overview

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

This is an unofficial implementation of AutoVC based on the official one.

The repository is still under construction, so some details may be missing or incomplete.

Preprocessing

python preprocess.py <data_path> <save_path> <encoder_path> [--seg_len seg] [--n_workers workers]

Training

python train.py <config> <data_path> <save_path> [--n_steps steps] [--save_steps save] [--log_steps log] [--batch_size batch] [--seg_len seg]

Reference

Please cite the paper if you find it useful.

@InProceedings{pmlr-v97-qian19c,
  title = {{A}uto{VC}: Zero-Shot Voice Style Transfer with Only Autoencoder Loss},
  author = {Qian, Kaizhi and Zhang, Yang and Chang, Shiyu and Yang, Xuesong and Hasegawa-Johnson, Mark},
  pages = {5210--5219},
  year = {2019},
  editor = {Kamalika Chaudhuri and Ruslan Salakhutdinov},
  volume = {97},
  series = {Proceedings of Machine Learning Research},
  address = {Long Beach, California, USA},
  month = {09--15 Jun},
  publisher = {PMLR},
  pdf = {http://proceedings.mlr.press/v97/qian19c/qian19c.pdf},
  url = {http://proceedings.mlr.press/v97/qian19c.html}
}

An unofficial implementation of the paper "AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss".

Related tags

Overview

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss

Preprocessing

Training

Reference

Owner

Chien-yu Huang

Demo for the paper "Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation"

End-to-end pipeline for real-time scene text detection and recognition.

Perspective recovery of text using transformed ellipses

Textboxes : Image Text Detection Model : python package (tensorflow)

This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

A curated list of promising OCR resources

docstrum

A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Lightning Fast Language Prediction 🚀

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

FastOCR is a desktop application for OCR API.

Some Boring Research About Products Recognition 、Duplicate Img Detection、Img Stitch、OCR

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

TableBank: A Benchmark Dataset for Table Detection and Recognition

[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing