Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Overview

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

We propose Disentangled Audio-Visual System (DAVS) to address arbitrary-subject talking face generation in this work, which aims to synthesize a sequence of face images that correspond to given speech semantics, conditioning on either an unconstrained speech audio or video.

[Project] [Paper] [Demo]

Recommondation of our CVPR21 repo

This repo is barely maintaining since the version of this code is out of date. If you are interested in the topic of Talking Face Generation, feel free to try the CODE of our CVPR2021 PAPER!

Requirements

Generating test results

Create the default folder "checkpoints" and put the checkpoint in it or get the CHECKPOINT_PATH
  • Samples for testing can be found in this folder named 0572_0019_0003. This is a pre-processed sample from the Voxceleb Dataset.

  • Run the testing script to generate videos from video:

python test_all.py  --test_root ./0572_0019_0003/video --test_type video --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH
  • Run the testing script to generate videos from audio:
python test_all.py  --test_root ./0572_0019_0003/audio --test_type audio --test_audio_video_length 99 --test_resume_path CHECKPOINT_PATH

Sample Results

  • Talking Effect on Human Characters

  • Talking Effect on Non-human Characters (Trained on Human Faces Only)

Create more samples

  • The face detection tool used in the demo videos can be found at RSA. It will return a Matfile with 5 key point locations in a row for each image. Other face alignment methods are also appliable such as dlib. The key points for face alignement we used are the two for the center of the eyes and the average point of the corners of the mouth. With each image's PATH and the face POINTS, you can find our way of face alignment at preprocess/face_align.py.

  • Our preprocessing of the audio files is the same and borrowed from the matlab code of SyncNet. Then we save the mfcc features into bin files.

Preparing Training Data

  • We used the LRW dataset for training.
  • The directories are arranged like this:
data
├── train, val, test
|	├── 0, 1, 2 ... 499 (one folder for each class)
|	│   ├── 0, 1, 2 ... #videos per class
|	│   │   ├── align_face256
|	│   │   |   ├── 0, 1, ... 28.jpg
|	│   |   ├── mfcc20
|	│   │   |   ├── 2, 3 ... 26.bin

where each video is extracted to frames and aligned using our protocol, and each audio is processed and saved using Matlab.

Training

python train.py
  • This is still a beta version of the training code which only disentangles wid information from pid space. Running the train.py only might not be able to fully reproduce the paper. However, it can be served as a reference for how we implement the whole training process.
  • During our own implementation, the classification part (without generation and disentanglement) is pretrained first. The pretraining training code is temporarily not provided.

Postprocessing Details (Optional)

  • The directly generated results may suffer from a "zoom-in-and-out" condition which we assume is caused by our alignment of the training set. We solve the unstable problem using Subspace Video Stabilization in the demos.

License and Citation

The use of this software is RESTRICTED to non-commercial research and educational purposes.

@inproceedings{zhou2019talking,
  title     = {Talking Face Generation by Adversarially Disentangled Audio-Visual Representation},
  author    = {Zhou, Hang and Liu, Yu and Liu, Ziwei and Luo, Ping and Wang, Xiaogang},
  booktitle = {AAAI Conference on Artificial Intelligence (AAAI)},
  year      = {2019},
}

Acknowledgement

The structure of this codebase is borrowed from pix2pix.

Owner
Hang_Zhou
Ph.D. @ MMLab-CUHK
Hang_Zhou
Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data - Official PyTorch Implementation (CVPR 2022)

Commonality in Natural Images Rescues GANs: Pretraining GANs with Generic and Privacy-free Synthetic Data (CVPR 2022) Potentials of primitive shapes f

31 Sep 27, 2022
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023
Official code for "Maximum Likelihood Training of Score-Based Diffusion Models", NeurIPS 2021 (spotlight)

Maximum Likelihood Training of Score-Based Diffusion Models This repo contains the official implementation for the paper Maximum Likelihood Training o

Yang Song 84 Dec 12, 2022
Multi-tool reverse engineering collaboration solution.

CollaRE v0.3 Intorduction CollareRE is a tool for collaborative reverse engineering that aims to allow teams that do need to use more then one tool du

105 Nov 27, 2022
Perspective: Julia for Biologists

Perspective: Julia for Biologists 1. Examples Speed: Example 1 - Single cell data and network inference Domain: Single cell data Methodology: Network

Elisabeth Roesch 55 Dec 02, 2022
Long Expressive Memory (LEM)

Long Expressive Memory for Sequence Modeling This repository contains the implementation to reproduce the numerical experiments of the paper Long Expr

Konstantin Rusch 47 Dec 17, 2022
[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning DouZero is a reinforcement learning framework for DouDizhu (斗地主), t

Kwai Inc. 3.1k Jan 04, 2023
Complete* list of autonomous driving related datasets

AD Datasets Complete* and curated list of autonomous driving related datasets Contributing Contributions are very welcome! To add or update a dataset:

Daniel Bogdoll 13 Dec 19, 2022
Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Struct-MDC (click the above buttons for redirection!) Official page of "Struct-MDC: Mesh-Refined Unsupervised Depth Completion Leveraging Structural R

Urban Robotics Lab. @ KAIST 37 Dec 22, 2022
We are More than Our JOints: Predicting How 3D Bodies Move

We are More than Our JOints: Predicting How 3D Bodies Move Citation This repo contains the official implementation of our paper MOJO: @inproceedings{Z

72 Oct 20, 2022
A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

3DB 112 Jan 01, 2023
A symbolic-model-guided fuzzer for TLS

tlspuffin TLS Protocol Under FuzzINg A symbolic-model-guided fuzzer for TLS Master Thesis | Thesis Presentation | Documentation Disclaimer: The term "

69 Dec 20, 2022
GrailQA: Strongly Generalizable Question Answering

GrailQA is a new large-scale, high-quality KBQA dataset with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It ca

OSU DKI Lab 76 Dec 21, 2022
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

benfeng 69 Nov 15, 2022
Anchor-free Oriented Proposal Generator for Object Detection

Anchor-free Oriented Proposal Generator for Object Detection Gong Cheng, Jiabao Wang, Ke Li, Xingxing Xie, Chunbo Lang, Yanqing Yao, Junwei Han, Intro

jbwang1997 56 Nov 15, 2022
Official Repository for our ECCV2020 paper: Imbalanced Continual Learning with Partitioning Reservoir Sampling

Imbalanced Continual Learning with Partioning Reservoir Sampling This repository contains the official PyTorch implementation and the dataset for our

Chris Dongjoo Kim 40 Sep 18, 2022
The source code of CVPR 2019 paper "Deep Exemplar-based Video Colorization".

Deep Exemplar-based Video Colorization (Pytorch Implementation) Paper | Pretrained Model | Youtube video 🔥 | Colab demo Deep Exemplar-based Video Col

Bo Zhang 253 Dec 27, 2022
Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

FPT_data_centric_competition - Team nan solution repository for FPT data-centric competition. Data augmentation, Albumentation, Mosaic, Visualization, KNN application

Pham Viet Hoang (Harry) 2 Oct 30, 2022
Composing methods for ML training efficiency

MosaicML Composer contains a library of methods, and ways to compose them together for more efficient ML training.

MosaicML 2.8k Jan 08, 2023
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023