PyTorch implementation of "VRT: A Video Restoration Transformer"

Last update: Jan 09, 2023

Overview

VRT: A Video Restoration Transformer

Jingyun Liang, Jiezhang Cao, Yuchen Fan, Kai Zhang, Rakesh Ranjan, Yawei Li, Radu Timofte, Luc Van Gool

Computer Vision Lab, ETH Zurich & Meta Inc.

arxiv | supplementary | pretrained models | visual results

This repository is the official PyTorch implementation of "VRT: A Video Restoration Transformer" (arxiv, supp, pretrained models, visual results). VRT ahcieves state-of-the-art performance (up to 2.16dB) in

video SR (REDS, Vimeo90K, Vid4 and UDM10)
video deblurring (GoPro, DVD and REDS)
video denoising (DAVIS and Set8)

🚀 🚀 🚀 News:

Jan. 26, 2022: See our previous works on

Topic	Title	Badge
transformer-based image restoration	SwinIR: Image Restoration Using Swin Transformer, ICCVW2021
real-world image SR	Designing a Practical Degradation Model for Deep Blind Image Super-Resolution, ICCV2021
normalizing flow-based image SR and image rescaling	Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling, ICCV2021
blind image SR	Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution, ICCV2021
blind image SR	Flow-based Kernel Prior with Application to Blind Super-Resolution, CVPR2021

Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. Different from single image restoration, video restoration generally requires to utilize temporal information from multiple adjacent but usually misaligned video frames. Existing deep methods generally tackle with this by exploiting a sliding window strategy or a recurrent architecture, which either is restricted by frame-by-frame restoration or lacks long-range modelling ability. In this paper, we propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities. More specifically, VRT is composed of multiple scales, each of which consists of two kinds of modules: temporal mutual self attention (TMSA) and parallel warping. TMSA divides the video into small clips, on which mutual attention is applied for joint motion estimation, feature alignment and feature fusion, while self-attention is used for feature extraction. To enable cross-clip interactions, the video sequence is shifted for every other layer. Besides, parallel warping is used to further fuse information from neighboring frames by parallel feature warping. Experimental results on three tasks, including video super-resolution, video deblurring and video denoising, demonstrate that VRT outperforms the state-of-the-art methods by large margins (up to 2.16 dB) on nine benchmark datasets.

Requirements
Training
Quick Testing
Results
Citation
License and Acknowledgement

Requirements

Python 3.8, PyTorch >= 1.9.1

Requirements: see requirements.txt

Platforms: Ubuntu 18.04, cuda-11.1

Quick Testing

Following commands will download pretrained models and test datasets automatically (except Vimeo-90K testing set). If out-of-memory, try to reduce --tile at the expense of decreased performance.

# download code
git clone https://github.com/JingyunLiang/VRT
cd VRT
pip install -r requirements.txt

# 001, video sr trained on REDS (6 frames), tested on REDS4
python main_test_vrt.py --task 001_VRT_videosr_bi_REDS_6frames --folder_lq testsets/REDS4/sharp_bicubic --folder_gt testsets/REDS4/GT --tile 40 128 128 --tile_overlap 2 20 20

# 002, video sr trained on REDS (16 frames), tested on REDS4
python main_test_vrt.py --task 002_VRT_videosr_bi_REDS_16frames --folder_lq testsets/REDS4/sharp_bicubic --folder_gt testsets/REDS4/GT --tile 40 128 128 --tile_overlap 2 20 20

# 003, video sr trained on Vimeo (bicubic), tested on Vid4 and Vimeo
python main_test_vrt.py --task 003_VRT_videosr_bi_Vimeo_7frames --folder_lq testsets/Vid4/BIx4 --folder_gt testsets/Vid4/GT --tile 32 128 128 --tile_overlap 2 20 20
python main_test_vrt.py --task 003_VRT_videosr_bi_Vimeo_7frames --folder_lq testsets/vimeo90k/vimeo_septuplet_matlabLRx4/sequences --folder_gt testsets/vimeo90k/vimeo_septuplet/sequences --tile 8 0 0 --tile_overlap 0 20 20

# 004, video sr trained on Vimeo (blur-downsampling), tested on Vid4, UDM10 and Vimeo
python main_test_vrt.py --task 004_VRT_videosr_bd_Vimeo_7frames --folder_lq testsets/Vid4/BDx4 --folder_gt testsets/Vid4/GT --tile 32 128 128 --tile_overlap 2 20 20
python main_test_vrt.py --task 004_VRT_videosr_bd_Vimeo_7frames --folder_lq testsets/UDM10/BDx4 --folder_gt testsets/UDM10/GT --tile 32 128 128 --tile_overlap 2 20 20
python main_test_vrt.py --task 004_VRT_videosr_bd_Vimeo_7frames --folder_lq testsets/vimeo90k/vimeo_septuplet_BDLRx4/sequences --folder_gt testsets/vimeo90k/vimeo_septuplet/sequences --tile 8 0 0 --tile_overlap 0 20 20

# 005, video deblurring trained and tested on DVD
python main_test_vrt.py --task 005_VRT_videodeblurring_DVD --folder_lq testsets/DVD10/test_GT_blurred --folder_gt testsets/DVD10/test_GT --tile 12 256 256 --tile_overlap 2 20 20

# 006, video deblurring trained and tested on GoPro
python main_test_vrt.py --task 006_VRT_videodeblurring_GoPro --folder_lq testsets/GoPro11/test_GT_blurred --folder_gt testsets/GoPro11/test_GT --tile 18 192 192 --tile_overlap 2 20 20

# 007, video deblurring trained on REDS, tested on REDS4
python main_test_vrt.py --task 007_VRT_videodeblurring_REDS --folder_lq testsets/REDS4/blur --folder_gt testsets/REDS4/GT --tile 12 256 256 --tile_overlap 2 20 20

# 008, video denoising trained on DAVIS (noise level 0-50) and tested on Set8 and DAVIS
python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --sigma 10 --folder_lq testsets/Set8 --folder_gt testsets/Set8 --tile 12 256 256 --tile_overlap 2 20 20
python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --sigma 10  --folder_lq testsets/DAVIS-test --folder_gt testsets/DAVIS-test --tile 12 256 256 --tile_overlap 2 20 20

All visual results of VRT can be downloaded here.

Training

The training and testing sets are as follows (see the supplementary for a detailed introduction of all datasets). For better I/O speed, use create_lmdb.py to convert .png datasets to .lmdb datasets.

Note: You do NOT need to prepare the datasets if you just want to test the model. main_test_vrt.py will download the testing set automaticaly.

Task	Training Set	Testing Set	Pretrained Model and Visual Results of VRT
video SR (setting 1, BI)	REDS sharp & sharp_bicubic (266 videos, 266000 frames: train + val except REDS4) *Use regroup_reds_dataset.py to regroup and rename REDS val set	REDS4 (4 videos, 400 frames: 000, 011, 015, 020 of REDS)	here
video SR (setting 2 & 3, BI & BD)	Vimeo90K (64612 seven-frame videos as in `sep_trainlist.txt`) * Use generate_LR_Vimeo90K.m and generate_LR_Vimeo90K_BD.m to generate LR frames for bicubic and blur-downsampling VSR, respectively.	Vimeo90K-T (the rest 7824 7-frame videos) + Vid4 (4 videos) + UDM10 (10 videos) *Use prepare_UDM10.py to regroup and rename the UDM10 dataset	here
video deblurring (setting 1, motion blur)	DVD (61 videos, 5708 frames) *Use prepare_DVD.py to regroup and rename the dataset.	DVD (10 videos, 1000 frames) *Use evaluate_video_deblurring.m for final evaluation.	here
video deblurring (setting 2, motion blur)	GoPro (22 videos, 2103 frames) *Use prepare_GoPro_as_video.py to regroup and rename the dataset.	GoPro (11 videos, 1111 frames) *Use evaluate_video_deblurring.m for final evaluation.	here
video deblurring (setting 3, motion blur)	REDS sharp & blur (266 videos, 266000 frames: train & val except REDS4) *Use regroup_reds_dataset.py to regroup and rename REDS val set. Note that it shares the same HQ frames as in VSR.	REDS4 (4 videos, 400 frames: 000, 011, 015, 020 of REDS)	here
video denoising (Gaussian noise)	DAVIS-2017 (90 videos, 6208 frames) *Use all files in DAVIS/JPEGImages/480p	DAVIS-2017-test (30 videos) + Set8 (8 videos: tractor, touchdown, park_joy and sunflower selected from DERF + hypersmooth, motorbike, rafting and snowboard from GOPRO_540P)	here

The training code will be put in KAIR.

Results

We achieved state-of-the-art performance on video SR, video deblurring and video denoising. Detailed results can be found in the paper.

Video Super-Resolution (click me)

Video Deblurring

Video Denoising

Citation

@article{liang2022vrt,
    title={VRT: A Video Restoration Transformer},
    author={Liang, Jingyun and Cao, Jiezhang and Fan, Yuchen and Zhang, Kai and Ranjan, Rakesh and Li, Yawei and Timofte, Radu and Van Gool, Luc},
    journal={arXiv preprint arXiv:2108.10257},
    year={2022}
}

License and Acknowledgement

This project is released under the CC-BY-NC license. We refer to codes from KAIR, BasicSR, Video Swin Transformer and mmediting. Thanks for their awesome works. The majority of VRT is licensed under CC-BY-NC, however portions of the project are available under separate license terms: KAIR is licensed under the MIT License, BasicSR, Video Swin Transformer and mmediting are licensed under the Apache 2.0 license.

Comments

Problem of "use_checkpoint_attn".

I try to reimplement the training part. But I encounter the following problem when I set the param 'use_checkpoint_attn' in self.residual_group2 as True. Could you provide solutions to solve it?

opened by wlj961012 8
RuntimeError expected input... to have 28 channels, but got 27 channels instead

I am getting this error on my own test data (with task 008_VRT_videodenoising_DAVIS)

RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 32, 128, 128] to have 28 channels, but got 27 channels instead

Full stack: File "C:\Dev\VRT\models\network_vrt.py", line 1395, in forward x = self.conv_first(x.transpose(1, 2)) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\conv.py", line 590, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\tools\miniconda3\envs\pt\lib\site-packages\torch\nn\modules\conv.py", line 585, in _conv_forward return F.conv3d( RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 32, 128, 128] to have 28 channels, but got 27 channels instead

opened by ialhashim 6
I tested video SR with SwinIR and VRT, but SwinIR perfroms better. Is it normal?
VRT testing command

CUDA_VISIBLE_DEVICES=9 \ python main_test_vrt.py --task 002_VRT_videosr_bi_REDS_16frames \ --folder_lq /home/liao/cjj/dataset/test/LR \ --folder_gt /home/liao/cjj/dataset/test/GT \ --tile 10 128 128 \ --tile_overlap 2 20 20

SwinIR model: 001_classicalSR_DIV2K_s48w8_SwinIR-M_x4

Video for testing: https://cowtransfer.com/s/1739646a86874e

Result

SwinIR | | | | | -- | -- | -- | -- | -- | -- | 1 | 2 | 3 | 4 | 平均 PSNR | 26.9603 | 31.9831 | 33.0922 | 33.2781 | 31.32843 SSIM | 0.7353 | 0.9022 | 0.8842 | 0.9233 | 0.86125 VRT | | | | | | 1 | 2 | 3 | 4 | 平均 PSNR | 26.7961 | 31.7153 | 30.7655 | 34.3461 | 30.90575 SSIM | 0.7272 | 0.8931 | 0.8724 | 0.9385 | 0.8578
opened by cjj490168650 5
How to run inference on larger frames e.g. 360p?

Hola! Thanks for the great work with VRT. I wanted to know if you have any tips and recommendations to how we can run your evaluation code against our own higher resolution frames. It seems from my tests that anything above 180p just runs OOM in a K80 (12G) and a T4 (16G) regardless of the tile size that I use for all models (REDS, Vimeo, etc.). Do you have any advice? Thanks!

opened by machinelearnear 5
Same error, solution didn't work: RuntimeError expected input... to have 28 channels, but got 27 channels instead
I ran into the same error as #14 , and verified that self.nonblind_denoising was set to True here, but still receive the error:

line 585, in _conv_forward return F.conv3d( RuntimeError: Given groups=1, weight of size [96, 28, 1, 3, 3], expected input[1, 27, 40, 128, 128] to have 28 channels, but got 27 channels instead

This is using the dataset VRT/testsets/REDS4/sharp_bicubic via the call python main_test_vrt.py --task 008_VRT_videodenoising_DAVIS --folder_lq testsets/REDS4/sharp_bicubic --tile 40 128 128 --tile_overlap 2 20 20. I ultimately want to run this on my own folder of PNGs from a video.
opened by dkoslicki 4
Request training setting recommendation of ×4 VSR

If I only have 2/4 3090s and want to train a model for ×4 VSR, how can I set training parameters effectively? That is no OOM, no large performance drop, mild training time.

For example, there are two parameters of using checkpoint to save Cuda memory, use_checkpoint_attn and use_checkpoint_ffn, which one is the most influence one for training time/memory consumption?

Looking forward to your reply, thank you.

opened by LuoXin-s 3
Testing fails in network_vrt.py @ get_flow_4frames, flows_forward[0].shape[1]
Hi, I've been trying to use this code in combination with the github://cszn/KAIR for training a VRT model using my data and a custom dataloader I wrote for my data. Unfortunately, I'm running into an error in the testing phase of the get_flow_4frames because the shape of the forward_flows[0] is: torch.Size([1, 0, 2, 64, 64])

The X input into forward is: torch.Size([1, 1, 3, 64, 64]) The X input into get_flows is: torch.Size([1, 1, 3, 64, 64]) The X input into get_flow_2frames: torch.Size([1, 1, 3, 64, 64]) The forward_flows[0] is as previously specified: torch.Size([1, 0, 2, 64, 64])

def get_flow_4frames(self, flows_forward, flows_backward): '''Get flow between t and t+2 from (t,t+1) and (t+1,t+2).''' # backward d = flows_forward[0].shape[1] flows_backward2 = [] for flows in flows_backward: flow_list = [] for i in range(d - 1, 0, -1): flow_n1 = flows[:, i - 1, :, :, :] # flow from i+1 to i flow_n2 = flows[:, i, :, :, :] # flow from i+2 to i+1 flow_list.insert(0, flow_n1 + flow_warp(flow_n2, flow_n1.permute(0, 2, 3, 1))) # flow from i+2 to i if len(flow_list) != 0: flows_backward2.append(torch.stack(flow_list, 1))

The training is working without any issues.

Is this the anticipated behavior within the code or is there something regarding the test settings that I'm missing?
opened by amrosado 2
Memory consumption while training

Hi, congrats on this cool work! I'm trying to train your model but I only have 2 A100 GPUs so the memory is limited, I wonder how much space do you need to train models like "003_VRT_videosr_bi_Vimeo_7frames.pth" and "006_VRT_videodeblurring_GoPro.pth"?

opened by xg416 2
Few questions about paper 😸

According to the paper "The runtime is 2.2s per frame on 1280×720 blurred videos". What gpu are you guys used to measure runtime?
Also i have question about model size, Did you guys try smaller model sizes (popular in modern transformers something like VRT-S VRT-L with different parameter size etc.) or architecture is limited and don't coverage with custom sizes?

And ofc. congrats on cool paper 📦

opened by machineko 2
Torch.distributed.elastic.multiprocessing.api.SignalException: Process XXXX got signal :1

Hello, thank you for the code. I meet an error when I train with 005_train_vrt_videodeblurring_dvd.json

Fix keys: ['spynet', 'deform'] for the first 20000 iters. Fix keys: ['spynet', 'deform'] for the first 20000 iters. 22-09-01 02:31:11.512 : <epoch: 0, iter: 400, lr:4.000e-04> G_loss: 7.544e-02 22-09-01 02:48:36.264 : <epoch: 0, iter: 600, lr:4.000e-04> G_loss: 1.637e-02 22-09-01 03:06:01.631 : <epoch: 0, iter: 800, lr:4.000e-04> G_loss: 7.941e-02 WARNING:torch.distributed.elastic.agent.server.api:Received 1 death signal, shutting down workers WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2704351 closing signal SIGHUP WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2704352 closing signal SIGHUP Traceback (most recent call last): File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module> main() File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main launch(args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch run(args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/run.py", line 755, in run )(*cmd_args) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 131, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 236, in launch_agent result = agent.run() File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/metrics/api.py", line 125, in wrapper result = f(*args, **kwargs) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py", line 709, in run result = self._invoke_run(role) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/agent/server/api.py", line 850, in _invoke_run time.sleep(monitor_interval) File "/home/ET/huiyuxiang/miniconda3/envs/deblur/lib/python3.7/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 60, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 2704341 got signal: 1

and I use the python=3.7.13, pytorch=1.12.1

opened by timerobin 1
VRT 2x upscale

I was wondering if the authors have any suggestions for finetuning the VRT model to do a 2x upscale instead of a 4x upscale. I removed some layers from the Upsample module to support 2x upscale, however the forward/backward pass is consuming too much VRAM. Which layers do you suggest to remove from the model to reduce the model complexity and also achieve good results for a 2x upscale?

Currently, I have tried 2x upscale training with 1 GPU, batch size =1, low quality frames crop size = 64x64, and high quality frames crop size = 128x128. The maximum VRAM usage in the forward pass/backward pass is 23GB.

opened by ArfaSaif 1
How can we denoise a video sequence with only lr input and no GT?

Great research, thank you! I want to test the denoising effect on my own dataset which has only noisy input and no corresponding GT, how can I do this? Thanks!

opened by haikunzhang95 0

FileNotFoundError

After 4 successfull cells,

FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-5-b2c3ee9af109>](https://localhost:8080/#) in <module>
      4   os.remove(zip_filename)
      5 os.system(f"zip -r -j {zip_filename} results/*")
----> 6 files.download(zip_filename)

[/usr/local/lib/python3.7/dist-packages/google/colab/files.py](https://localhost:8080/#) in download(filename)
    207   if not _os.path.exists(filename):
    208     msg = 'Cannot find file: {}'.format(filename)
--> 209     raise FileNotFoundError(msg)  # pylint: disable=undefined-variable
    210 
    211   comm_manager = _IPython.get_ipython().kernel.comm_manager

FileNotFoundError: Cannot find file: VRT_result.zip

opened by Drjacky 0

memory lack problem

hi, i do to training VRT model currently. but reached to memory lack problem.

i used to NVIDIA RTX 2080 Ti which device memory amount is 11GB for training. so i was adjusted batch-size from 8 to 4 and torch.utils.checkpoint used. (use_checkpoint_attn: true, use_checkpoint_ffn: true, gt_size=256)

but arise to memory lack (cannot allocate memory).

is it possible training using RTX 2080 Ti?

opened by mapsosa84 0
Log Files from Training

Hello,

Thank you for your awesome code!

I am hoping you might open-source the log files you have from training. Maybe the training and validation loss as a function of epoch (and/or batch) with an estimate of the runtime?

opened by gauenk 0
Inference Taking Forever

I am trying to deblur a 150 frames video using a machine having two NVIDIA RTX A5000 GPUs using the the GoPro delur model and I reduced the tile value. But this operation is taking forever. How to solve this ? Is NVIDIA RTX A5000 enough to make ineferece ?

opened by pentanol2 1

Releases(v0.0)

v0.0(Jan 18, 2022)

Pretrained models, supplementary, testsets and visual results of VRT
Source code(tar.gz)
Source code(zip)
001_VRT_videosr_bi_REDS_6frames.pth(157.99 MB)
002_VRT_videosr_bi_REDS_16frames.pth(201.24 MB)
003_VRT_videosr_bi_Vimeo_7frames.pth(191.49 MB)
004_VRT_videosr_bd_Vimeo_7frames.pth(191.49 MB)
005_VRT_videodeblurring_DVD.pth(101.58 MB)
006_VRT_videodeblurring_GoPro.pth(101.58 MB)
007_VRT_videodeblurring_REDS.pth(101.58 MB)
008_VRT_videodenoising_DAVIS.pth(101.58 MB)
009_VRT_videofi_Vimeo_4frames.pth(59.57 MB)
spynet_sintel_final-3d2a1287.pth(5.50 MB)
testset_DAVIS-test.tar.gz(303.34 MB)
testset_DAVIS-train.tar.gz(778.34 MB)
testset_DVD10.tar.gz(227.51 MB)
testset_GoPro11-part1.tar.gz(1051.30 MB)
testset_GoPro11-part2.tar.gz(1219.06 MB)
testset_REDS4.tar.gz(1048.06 MB)
testset_Set8.tar.gz(480.33 MB)
testset_UCF101.tar.gz(30.29 MB)
testset_UDM10.tar.gz(294.66 MB)
testset_Vid4.tar.gz(129.76 MB)
visual_results_001_VideoSR_BI_VRT_6frames_REDS4.tar.gz(515.09 MB)
visual_results_002_VideoSR_BI_VRT_16frames_REDS4.tar.gz(523.15 MB)
visual_results_003_VideoSR_BI_VRT_7frames_Vid4.tar.gz(111.02 MB)
visual_results_003_VideoSR_BI_VRT_7frames_Vimeo.tar.gz(1083.43 MB)
visual_results_004_VideoSR_BD_VRT_7frames_UDM10.tar.gz(315.44 MB)
visual_results_004_VideoSR_BD_VRT_7frames_Vid4.tar.gz(113.63 MB)
visual_results_004_VideoSR_BD_VRT_7frames_Vimeo.tar.gz(1087.50 MB)
visual_results_005_VideoDeblur_VRT_6frames_DVD.tar.gz(1308.46 MB)
visual_results_006_VideoDeblur_VRT_6frames_GoPro.tar.gz(1397.59 MB)
visual_results_007_VideoDeblur_VRT_6frames_REDS.tar.gz(577.89 MB)
visual_results_008_VideoDenoising_VRT_6frames_DAVIS_sigma10.tar.gz(1225.57 MB)
visual_results_008_VideoDenoising_VRT_6frames_DAVIS_sigma20.tar.gz(1181.86 MB)
visual_results_008_VideoDenoising_VRT_6frames_DAVIS_sigma30.tar.gz(1151.10 MB)
visual_results_008_VideoDenoising_VRT_6frames_DAVIS_sigma40.tar.gz(1126.38 MB)
visual_results_008_VideoDenoising_VRT_6frames_DAVIS_sigma50.tar.gz(1106.93 MB)
visual_results_008_VideoDenoising_VRT_6frames_Set8_sigma10.tar.gz(405.19 MB)
visual_results_008_VideoDenoising_VRT_6frames_Set8_sigma20.tar.gz(389.98 MB)
visual_results_008_VideoDenoising_VRT_6frames_Set8_sigma30.tar.gz(378.37 MB)
visual_results_008_VideoDenoising_VRT_6frames_Set8_sigma40.tar.gz(368.81 MB)
visual_results_008_VideoDenoising_VRT_6frames_Set8_sigma50.tar.gz(360.94 MB)
visual_results_009_VideoFrameInterpolation_VRT_4frames_DAVIS.tar.gz(1540.50 MB)
visual_results_009_VideoFrameInterpolation_VRT_4frames_UCF101.tar.gz(7.14 MB)
visual_results_009_VideoFrameInterpolation_VRT_4frames_Vimeo.tar.gz(1149.18 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vid4.tar.gz(106.69 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vimeo_fast.tar.gz(1030.86 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vimeo_medium_part1.tar.gz(1282.69 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vimeo_medium_part2.tar.gz(1990.74 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vimeo_medium_part3.tar.gz(1376.92 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VFI+VSR_Vimeo_slow.tar.gz.gz(1599.67 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vid4.tar.gz(110.37 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vimeo_fast.tar.gz(1063.53 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vimeo_medium_part1.tar.gz(1447.15 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vimeo_medium_part2.tar.gz(1908.12 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vimeo_medium_part3.tar.gz(1411.53 MB)
visual_results_010_SpaceTimeVideoSR_VRT_4frames_VSR+VFI_Vimeo_slow.tar.gz(1617.91 MB)
VRT_supplementary.pdf(9.76 MB)

Owner

Jingyun Liang

Image/Video Restoration. PhD Student at Computer Vision Lab, ETH Zurich.

GitHub Repository

PSPNet in Chainer

PSPNet This is an unofficial implementation of Pyramid Scene Parsing Network (PSPNet) in Chainer. Training Requirement Python 3.4.4+ Chainer 3.0.0b1+

76 Dec 12, 2022

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning PyTorch implementation for the paper: FACIAL: Synthesizing Dynamic Talking

226 Jan 08, 2023

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

24 Dec 17, 2022

Exploring whether attention is necessary for vision transformers

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet Paper/Report TL;DR We replace the attention layer in a v

461 Jan 07, 2023

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022

NDE: Climate Modeling with Neural Diffusion Equation, ICDM'21

Climate Modeling with Neural Diffusion Equation Introduction This is the repository of our accepted ICDM 2021 paper "Climate Modeling with Neural Diff

5 Dec 18, 2022

Collection of common code that's shared among different research projects in FAIR computer vision team.

fvcore fvcore is a light-weight core library that provides the most common and essential functionality shared in various computer vision frameworks de

1.5k Jan 07, 2023

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

Automatic, Readable, Reusable, Extendable Machin is a reinforcement library designed for pytorch. Build status Platform Status Linux Windows Supported

348 Dec 24, 2022

PyTorch implementation of "VRT: A Video Restoration Transformer"

Related tags

Overview

VRT: A Video Restoration Transformer

Contents

Requirements

Quick Testing

Training

Results

Citation

License and Acknowledgement

Comments

Releases(v0.0)

v0.0(Jan 18, 2022)

Owner

Jingyun Liang

PSPNet in Chainer

FACIAL: Synthesizing Dynamic Talking Face With Implicit Attribute Learning. ICCV, 2021.

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Exploring whether attention is necessary for vision transformers

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

NDE: Climate Modeling with Neural Diffusion Equation, ICDM'21

Collection of common code that's shared among different research projects in FAIR computer vision team.

Reinforcement learning library(framework) designed for PyTorch, implements DQN, DDPG, A2C, PPO, SAC, MADDPG, A3C, APEX, IMPALA ...

An implementation of Fastformer: Additive Attention Can Be All You Need in TensorFlow

Implementation of DropLoss for Long-Tail Instance Segmentation in Pytorch

A machine learning malware analysis framework for Android apps.

Stochastic Extragradient: General Analysis and Improved Rates

PyTorch implementation of the Transformer in Post-LN (Post-LayerNorm) and Pre-LN (Pre-LayerNorm).

How to Leverage Multimodal EHR Data for Better Medical Predictions?

[SIGGRAPH 2021 Asia] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Python library for science observations from the James Webb Space Telescope

🎃 Core identification module of AI powerful point reading system platform.

A lightweight face-recognition toolbox and pipeline based on tensorflow-lite

Pytorch implementation of "Geometrically Adaptive Dictionary Attack on Face Recognition" (WACV 2022)