Layered Neural Atlases for Consistent Video Editing

Overview

Layered Neural Atlases for Consistent Video Editing

Project Page | Paper

This repository contains an implementation for the SIGGRAPH Asia 2021 paper Layered Neural Atlases for Consistent Video Editing.

The paper introduces the first approach for neural video unwrapping using an end-to-end optimized interpretable and semantic atlas-based representation, which facilitates easy and intuitive editing in the atlas domain.

Installation Requirements

The code is compatible with Python 3.7 and PyTorch 1.6.

You can create an anaconda environment called neural_atlases with the required dependencies by running:

conda create --name neural_atlases python=3.7 
conda activate neural_atlases 
conda install pytorch=1.6.0 torchvision=0.7.0 cudatoolkit=10.1 matplotlib tensorboard scipy  scikit-image tqdm  opencv -c pytorch
pip install imageio-ffmpeg gdown
python -m pip install detectron2 -f   https://dl.fbaipublicfiles.com/detectron2/wheels/cu101/torch1.6/index.html

Data convention

The code expects 3 folders for each video input, e.g. for a video of 50 frames named "blackswan":

  1. data/blackswan: A folder of video frames containing image files in the following convention: blackswan/00000.jpg,blackswan/00001.jpg,...,blackswan/00049.jpg (as in the DAVIS dataset).
  2. data/blackswan_flow: A folder with forward and backward optical flow files in the following convention: blackswan_flow/00000.jpg_00001.jpg.npy,blackswan_flow/00001.jpg_00000.jpg,...,blackswan_flow/00049.jpg_00048.jpg.npy.
  3. data/blackswan_maskrcnn: A folder with rough masks (created by Mask-RCNN or any other way) containing files in the following convention: blackswan_maskrcnn/00000.jpg,blackswan_maskrcnn/00001.jpg,...,blackswan_maskrcnn/00049.jpg

For a few examples of DAVIS sequences run:

gdown https://drive.google.com/uc?id=1WipZR9LaANTNJh764ukznXXAANJ5TChe
unzip data.zip

Masks extraction

Given only the video frames folder data/blackswan it is possible to extract the Mask-RCNN masks (and create the required folder data/blackswan_maskrcnn) by running:

python preprocess_mask_rcnn.py --vid-path data/blackswan --class_name bird

where --class_name determines the COCO class name of the sought foreground object. It is also possible to choose the first instance retrieved by Mask-RCNN by using --class_name anything. This is usefull for cases where Mask-RCNN gets correct masks with wrong classes as in the "libby" video:

python preprocess_mask_rcnn.py --vid-path data/libby --class_name anything

Optical flows extraction

Furthermore, the optical flow folder can be extracted using RAFT. For linking RAFT into the current project run:

git submodule update --init
cd thirdparty/RAFT/
./download_models.sh
cd ../..

For extracting the optical flows (and creating the required folder data/blackswan_flow) run:

python preprocess_optical_flow.py --vid-path data/blackswan --max_long_edge 768

Pretrained models

For downloading a sample set of our pretrained models together with sample edits run:

gdown https://drive.google.com/uc?id=10voSCdMGM5HTIYfT0bPW029W9y6Xij4D
unzip pretrained_models.zip

Training

For training a model on a video, run:

python train.py config/config.json

where the video frames folder is determined by the config parameter "data_folder". Note that in order to reduce the training time it is possible to reduce the evaluation frequency controlled by the parameter "evaluate_every" (e.g. by changing it to 10000). The other configurable parameters are documented inside the file train.py.

Evaluation

During training, the model is evaluated. For running only evaluation on a trained folder run:

python only_evaluate.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=evaluation_outputs

where trained_model_folder is the path to a folder that contains the config.json and checkpoint files of the trained model.

Editing

To apply editing, run the script only_edit.py. Examples for the supplied pretrained models for "blackswan" and "boat":

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/blackswan --video_name=blackswan --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_foreground.png --edit_background_path=pretrained_models/edit_inputs/blackswan/edit_blackswan_background.png
python only_edit.py --trained_model_folder=pretrained_models/checkpoints/boat --video_name=boat --data_folder=data --output_folder=editing_outputs --edit_foreground_path=pretrained_models/edit_inputs/boat/edit_boat_foreground.png --edit_background_path=pretrained_models/edit_inputs/boat/edit_boat_backgound.png

Where edit_foreground_path and edit_background_path specify the paths to 1000x1000 images of the RGBA atlas edits.

For applying an edit that was done on a frame (e.g. for the pretrained "libby"):

python only_edit.py --trained_model_folder=pretrained_models/checkpoints/libby --video_name=libby --data_folder=data --output_folder=editing_outputs  --use_edit_frame --edit_frame_index=7 --edit_frame_path=pretrained_models/edit_inputs/libby/edit_frame_.png

Citation

If you find our work useful in your research, please consider citing:

@article{kasten2021layered,
  title={Layered Neural Atlases for Consistent Video Editing},
  author={Kasten, Yoni and Ofri, Dolev and Wang, Oliver and Dekel, Tali},
  journal={arXiv preprint arXiv:2109.11418},
  year={2021}
}
Owner
Yoni Kasten
Yoni Kasten
Keeper for Ricochet Protocol, implemented with Apache Airflow

Ricochet Keeper This repository contains Apache Airflow DAGs for executing keeper operations for Ricochet Exchange. Usage You will need to run this us

Ricochet Exchange 5 May 24, 2022
K-Means Clustering and Hierarchical Clustering Unsupervised Learning Solution in Python3.

Unsupervised Learning - K-Means Clustering and Hierarchical Clustering - The Heritage Foundation's Economic Freedom Index Analysis 2019 - By David Sal

David Salako 1 Jan 12, 2022
Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi Intelligent BOT The project involves : Classifying fake news Classifying objects such as aeroplane, automobile, bird, cat, deer, dog, frog, horse

MohammadReza Norouzi 5 Mar 06, 2022
Facial detection, landmark tracking and expression transfer library for Windows, Linux and Mac

Welcome to the CSIRO Face Analysis SDK. Documentation for the SDK can be found in doc/documentation.html. All code in this SDK is provided according t

Luiz Carlos Vieira 7 Jul 16, 2020
交互式标注软件,暂定名 iann

iann 交互式标注软件,暂定名iann。 安装 按照官网介绍安装paddle。 安装其他依赖 pip install -r requirements.txt 运行 git clone https://github.com/PaddleCV-SIG/iann/ cd iann python iann

294 Dec 30, 2022
This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Ditch the Gold Standard: Re-evaluating Conversational Question Answering This is the repository for our paper Ditch the Gold Standard: Re-evaluating C

Princeton Natural Language Processing 38 Dec 16, 2022
Extremely easy multi instancing software for minecraft speedrunning.

Easy Multi Extremely easy multi/single instancing software for minecraft speedrunning. A couple of goals of this project: Setup multi in minutes No fi

Duncan 8 Jul 16, 2022
Pixel-wise segmentation on VOC2012 dataset using pytorch.

PiWiSe Pixel-wise segmentation on the VOC2012 dataset using pytorch. FCN SegNet PSPNet UNet RefineNet For a more complete implementation of segmentati

Bodo Kaiser 378 Dec 30, 2022
Diverse Image Captioning with Context-Object Split Latent Spaces (NeurIPS 2020)

Diverse Image Captioning with Context-Object Split Latent Spaces This repository is the PyTorch implementation of the paper: Diverse Image Captioning

Visual Inference Lab @TU Darmstadt 34 Nov 21, 2022
Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides

Predicting Axillary Lymph Node Metastasis in Early Breast Cancer Using Deep Learning on Primary Tumor Biopsy Slides Project | This repo is the officia

CVSM Group - email: <a href=[email protected]"> 33 Dec 28, 2022
A Convolutional Transformer for Keyword Spotting

☢️ Audiomer ☢️ Audiomer: A Convolutional Transformer for Keyword Spotting [ arXiv ] [ Previous SOTA ] [ Model Architecture ] Results on SpeechCommands

49 Jan 27, 2022
MNE: Magnetoencephalography (MEG) and Electroencephalography (EEG) in Python

MNE-Python MNE-Python software is an open-source Python package for exploring, visualizing, and analyzing human neurophysiological data such as MEG, E

MNE tools for MEG and EEG data analysis 2.1k Dec 28, 2022
JittorVis - Visual understanding of deep learning models

JittorVis: Visual understanding of deep learning model JittorVis is an open-source library for understanding the inner workings of Jittor models by vi

thu-vis 182 Jan 06, 2023
Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Prompt-Tuning Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning" Currently, we support the following huggigface models: Bart

Andrew Zeng 36 Dec 19, 2022
tinykernel - A minimal Python kernel so you can run Python in your Python

tinykernel - A minimal Python kernel so you can run Python in your Python

fast.ai 37 Dec 02, 2022
NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

NaturalCC NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models for many software engineering tasks,

159 Dec 28, 2022
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

Machine Learning From Scratch About Python implementations of some of the fundamental Machine Learning models and algorithms from scratch. The purpose

Erik Linder-Norén 21.8k Jan 09, 2023
[arXiv22] Disentangled Representation Learning for Text-Video Retrieval

Disentangled Representation Learning for Text-Video Retrieval This is a PyTorch implementation of the paper Disentangled Representation Learning for T

Qiang Wang 49 Dec 18, 2022