Implementation of Stochastic Image-to-Video Synthesis using cINNs.

Overview

Stochastic Image-to-Video Synthesis using cINNs

Official PyTorch implementation of Stochastic Image-to-Video Synthesis using cINNs accepted to CVPR2021.

teaser.mp4

Arxiv | Project Page | Supplemental | Pretrained Models | BibTeX

Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Kosta Derpanis*, Björn Ommer*, CVPR 2021

tl;dr We present a framework for both stochastic and controlled image-to-video synthesis. We bridge the gap between the image and video domain using conditional invertible neural networks and account for the inherent ambiguity with a learned, dedicated scene dynamics representation.

teaser

For any questions, issues, or recommendations, please contact Michael at m.dorkenwald(at)gmail.com. If our project is helpful for your research, please consider citing.

Table of Content

  1. Requirements
  2. Running pretrained models
  3. Data preparation
  4. Evaluation
    1. Synthesis quality
    2. Diversity
  5. Training
    1. Stage1: Video-to-Video synthesis
    2. Stage2: cINN for Image-to-Video synthesis
  6. Shout-outs
  7. BibTeX

Requirements

A suitable conda environment named i2v can be created and activated with

conda env create -f environment.yaml
conda activate i2v

For this repository cuda verion 11.1 is used. To suppress the annoying warnings from kornia please run all python scripts with -W ignore.

Running pretrained models

One can test our method using the scripts below on images placed in assets/GT_samples after placing the pre-trained model weights for the corresponding datasets e.g. bair in the models folder like models/bair/.

python -W ignore generate_samples.py -dataset landscape -gpu <gpu_id> -seq_length <sequence_length>

teaser

Moreoever, one can also transfer an observed dynamic from a given video (first row) to an arbitrary starting frame using

python -W ignore generate_transfer.py -dataset landscape -gpu <gpu_id> 

teaser teaser

python -W ignore generate_samples.py -dataset bair -gpu <gpu_id> 

teaser

Our model can be extended to control specific factors e.g. the endpoint location of the robot arm. Note, to run this script you need to download the BAIR dataset.

python -W ignore visualize_endpoint.py -dataset bair -gpu <gpu_id> -data_path <path2data>
Sample 1 Sample 2

or look only on the last frame of the generated sequence, which is similar since all videos were conditioned on the same endpoint

Sample 1 Sample 2
python -W ignore generate_samples.py -dataset iPER -gpu <GPU_ID>

teaser

python -W ignore generate_samples.py -dataset DTDB -gpu <GPU_ID> -texture fire

teaser

python -W ignore generate_samples.py -dataset DTDB -gpu <GPU_ID> -texture vegetation

teaser

python -W ignore generate_samples.py -dataset DTDB -gpu <GPU_ID> -texture clouds

teaser

python -W ignore generate_samples.py -dataset DTDB -gpu <GPU_ID> -texture waterfall

teaser

Data preparation

BAIR

To download the dataset to a given target directory <TARGETDIR>, run the following command

sh data/bair/download_bair.sh <TARGETDIR>

In order to convert the tensorflow records file run the following command

python data/bair/convert_bair.py --data_dir <DATADIR> --output_dir <TARGETDIR>

traj_256_to_511 is used for validation and traj_0_to_255 for testing. The resulting folder structure should be the following

$bair/train/
├── traj_512_to_767
│   ├── 1
|   ├── ├── 0.png
|   ├── ├── 1.png
|   ├── ├── 2.png
|   ├── ├── ...
│   ├── 2
│   ├── ...
├── ...
$bair/eval/
├── traj_256_to_511
│   ├── 1
|   ├── ├── 0.png
|   ├── ├── 1.png
|   ├── ├── 2.png
|   ├── ├── ...
│   ├── 2
│   ├── ...
$bair/test/
├── traj_0_to_255
│   ├── 1
|   ├── ├── 0.png
|   ├── ├── 1.png
|   ├── ├── 2.png
|   ├── ├── ...
│   ├── 2
│   ├── ...

Please cite the corresponding paper if you use the data.

Landscape

Download the corresponding dataset from here using e.g. gdown. To use our provided data loader all images need to be renamed to frame0 to frameX to alleviate the problem of missing frames. Therefore the following script can be used

python data/landscape/rename_images.py --data_dir <DATADIR> 

In data/landscape we provide a list of videos that were used for training and testing. Please cite the corresponding paper if you use the data.

iPER

Download the dataset from here and run

python data/iPER/extract_iPER.py --raw_dir <DATADIR> --processed_dir <TARGETDIR>

to extract the frames. In data/iPER we provide a list of videos that were used for train, eval, and test. Please cite the corresponding paper if you use the data.

Dynamic Textures

Download the corrsponding dataset from here and unzip it. Please cite the corresponding paper if you use the data. The original mp4 files from DTDB can be downloaded from here.

Evaluation

After storing the data as described, the evaluation script for each dataset can be used.

Synthesis quality

We use the following metrics to measure synthesis quality: LPIPS, FID, FVD, DTFVD. The latter was introduced in this work and is a specific FVD for dynamic textures. Therefore, please download the weights of the I3D model from here and place it in the models folder like /models/DTI3D/. For more details on DTFVD please see Sec. C3 in supplemental. To compute the mentioned metrics for a given dataset please run

python -W ignore eval_synthesis_quality.py -gpu <gpu_id> -dataset <dataset> -data_path <path2data> -FVD True -LPIPS True -FID True -DTFVD True

for DTDB please specify the dynamic texture you want to evalute e.g. fire

python -W ignore eval_synthesis_quality.py -gpu <gpu_id> -dataset DTDB -data_path <path2data> -texture fire -FVD True -LPIPS True -FID True -DTFVD True

Please cite our work if you use DTFVD in your work. If you place the chkpts outside this repository please specify the location using the argument -chkpt <path_to_chkpt>.

Diversity

We measure diversity by comparing different realizations of an example using a pretrained VGG, I3D and DTI3D backbone. The last two consider the temporal property of the data whereas for the VGG diversity score compared images framewise. To evaluate diversity for a given dataset please run

python -W ignore eval_diversity.py -gpu <gpu_id> -dataset <dataset> -data_path <path2data> -DTI3D True -VGG True -I3D True -seq_length <length>

for DTDB please specify the dynamic texture you want to evalute e.g. fire

python -W ignore eval_diversity.py -gpu <gpu_id> -dataset DTDB -data_path <path2data> -texture fire -DTI3D True -VGG True -I3D True 

Training

The training of our models is divided into two consecutive stages. In stage 1, we learn an information preserving video latent representation using a conditional generative model which reconstructs the given input video as best as possible. After that, we learn a conditional INN to map the video latent representation to a residual space depicting the scene dynamics conditioned on the starting frame and additional control factors. During inference, we now can sample new scene dynamics from the residual distribution and synthesize novel videos due to the bijective nature of the cINN. For more details please check out our paper.

For logging our runs we used and recommend wandb. Please create a free account and add your username to the config. If you don't want to use it, the metrics are also logged in a csv file and samples are written out in the specified chkpt folder. Therefore, please set logging mode to offline. For logging (PyTorch) FVD please download the weights of a PyTorch I3D from here and place it in models like /models/PI3D/. For logging DTFVD please download the weights of the DTI3D model from here and place it in the models folder like /models/DTI3D/. Depending on the dataset please specify either FVD or DTFVD under FVD in the config. For each provided pretrained model we left the corresponding config file in the corresponding folder. If you want to run our model on a dataset we did not provide please create a new config. Before you start a run please specify the data path, save path, and the name of the run in the config.

Stage 1: Video-to-Video synthesis

To train the conditional generative model for video-to-video synthesis run the following command

python -W ignore -m stage1_VAE.main -gpu <gpu_id> -cf stage1_VAE/configs/<config>

Stage 2: cINN for Image-to-Video synthesis

Before we can train the cINN, we need to train an AE to obtain an encoder to embed the starting frame for the cINN. You can use the on provided or train your own by running

python -W ignore -m stage2_cINN.AE.main -gpu <gpu_id> -cf stage2_cINN/AE/configs/<config>

To train the cINN, we need to specify the location of the trained encoder as well as the first stage model in the config. After that, training of the cINN can be started by

python -W ignore -m stage2_cINN.main -gpu <gpu_id> -cf stage2_cINN/configs/<config>

To reproduce the controlled video synthesis experiment, one can specify the control True in the bair_config.yaml to additional condition the cINN on the endpoint location.

Shout-outs

Thanks to everyone who makes their code and models available. In particular,

  • The decoder architecture is inspired by SPADE
  • The great work and code of Stochastic Latent Residual Video Prediction SRVP
  • The 3D encoder and discriminator are based on 3D-Resnet and spatial discriminator is adapted from PatchGAN
  • The metrics which were used LPIPS PyTorch FID FVD

BibTeX

@misc{dorkenwald2021stochastic,
      title={Stochastic Image-to-Video Synthesis using cINNs}, 
      author={Michael Dorkenwald and Timo Milbich and Andreas Blattmann and Robin Rombach and Konstantinos G. Derpanis and Björn Ommer},
      year={2021},
      eprint={2105.04551},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
CompVis Heidelberg
Computer Vision research group at the Ruprecht-Karls-University Heidelberg
CompVis Heidelberg
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023
Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

3 Dec 05, 2022
Can we learn gradients by Hamiltonian Neural Networks?

Can we learn gradients by Hamiltonian Neural Networks? This project was carried out as part of the Optimization for Machine Learning course (CS-439) a

2 Aug 22, 2022
以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的斗地主ai

ddz-ai 介绍 斗地主是一种扑克游戏。游戏最少由3个玩家进行,用一副54张牌(连鬼牌),其中一方为地主,其余两家为另一方,双方对战,先出完牌的一方获胜。 ddz-ai以孤立语假设和宽度优先搜索为基础,构建了一种多通道堆叠注意力Transformer结构的系统,使其经过大量训练后,能在实际游戏中获

freefuiiismyname 88 May 15, 2022
Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer"

StyleAttack Code and data of the EMNLP 2021 paper "Mind the Style of Text! Adversarial and Backdoor Attacks Based on Text Style Transfer" Prepare Pois

THUNLP 19 Nov 20, 2022
Pytorch implementation for Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression Knowledge distillation for BERT model Installation Run command below to install the environm

Siqi 180 Dec 19, 2022
A Simulated Optimal Intrusion Response Game

Optimal Intrusion Response An OpenAI Gym interface to a MDP/Markov Game model for optimal intrusion response of a realistic infrastructure simulated u

Kim Hammar 10 Dec 09, 2022
It's A ML based Web Site build with python and Django to find the breed of the dog

ML-Based-Dog-Breed-Identifier This is a Django Based Web Site To Identify the Breed of which your DOG belogs All You Need To Do is to Follow These Ste

Sanskar Dwivedi 2 Oct 12, 2022
Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation

FCN.tensorflow Tensorflow implementation of Fully Convolutional Networks for Semantic Segmentation (FCNs). The implementation is largely based on the

Sarath Shekkizhar 1.3k Dec 25, 2022
Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

shzhang 59 Dec 10, 2022
ShuttleNet: Position-aware Fusion of Rally Progress and Player Styles for Stroke Forecasting in Badminton (AAAI'22)

ShuttleNet: Position-aware Rally Progress and Player Styles Fusion for Stroke Forecasting in Badminton (AAAI 2022) Official code of the paper ShuttleN

Wei-Yao Wang 11 Nov 30, 2022
Transformers based fully on MLPs

Awesome MLP-based Transformers papers An up-to-date list of Transformers based fully on MLPs without attention! Why this repo? After transformers and

Fawaz Sammani 35 Dec 30, 2022
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

Simon Blanke 422 Jan 04, 2023
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022
Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SemCo The official pytorch implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

42 Nov 14, 2022
Magisk module to enable hidden features on Android 12 Developer Preview 1.

Android 12 Extensions This is a Magisk module that enables hidden features on Android 12 Developer Preview 1. Features Scrolling screenshots Wallpaper

Danny Lin 384 Jan 06, 2023
Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485

python-pylontech Python lib to talk to pylontech lithium batteries (US2000, US3000, ...) using RS485 What is this lib ? This lib is meant to talk to P

Frank 26 Dec 28, 2022
Gesture recognition on Event Data

Event based Gesture Recognition Gesture recognition on Event Data usually involv

2 Feb 14, 2022
JAX code for the paper "Control-Oriented Model-Based Reinforcement Learning with Implicit Differentiation"

Optimal Model Design for Reinforcement Learning This repository contains JAX code for the paper Control-Oriented Model-Based Reinforcement Learning wi

Evgenii Nikishin 43 Sep 28, 2022
Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

CompVis Heidelberg 153 Jan 04, 2023