[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Last update: Nov 09, 2022

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

pip install -r requirements.txt

Dataset Preparation

Given the dataset, please prepare the images paths in a folder named by the dataset with the following folder strcuture.

    flist/dataset_name
        ├── train.flist    # paths of training images
        ├── valid.flist    # paths of validation images
        └── test.flist     # paths of testing images

In this work, we use CelebA-HQ (Download availbale here), Places2 (Download availbale here), ParisStreet View (need author's permission to download)

ImageNet K-means Cluster: The kmeans_centers.npy is downloaded from image-gpt, it's used to quantitize the low-resolution images.

Testing with Pre-trained Models

Download pre-trained models:

CelebA-HQ: BAT ; Upsmapler
Places2: BAT ; Upsmapler
Paris-StreetView: BAT ; Upsmapler

Put the pre-trained model under the checkpoints folder, e.g.

    checkpoints
        ├── celebahq_bat_pretrain
            ├── latest_net_G.pth

Prepare the input images and masks to test.

python bat_sample.py --num_sample [1] --tran_model [bat name] --up_model [upsampler name] --input_dir [dir of input] --mask_dir [dir of mask] --save_dir [dir to save results]

Training New Models

Pretrained VGG model Download from here, move it to models/. This model is used to calculate training loss for the upsampler.

New models can be trained with the following commands.

Prepare dataset. Use --dataroot option to locate the directory of file lists, e.g. ./flist, and specify the dataset name to train with --dataset_name option. Identify the types and mask ratio using --mask_type and --pconv_level options.
Train the transformer.

# To specify your own dataset or settings in the bash file.
bash train_bat.sh

Please note that some of the transformer settings are defined in train_bat.py instead of options/, and this script will take every available gpus for training, please define the GPUs via CUDA_VISIBLE_DEVICES instead of --gpu_ids, which is used for the upsampler.

Train the upsampler.

# To specify your own dataset or settings in the bash file.
bash train_up.sh

The upsampler is typically trained by the low-resolution ground truth, we find that using some samples from the trained BAT might be helpful to improve the performance i.e. PSNR, SSIM. But the sampling process is quite time consuming, training with ground truth also could yield reasonable results.

Citation

If you find this code helpful for your research, please cite our papers.

@inproceedings{yu2021diverse,
  title={Diverse Image Inpainting with Bidirectional and Autoregressive Transformers},
  author={Yu, Yingchen and Zhan, Fangneng and Wu, Rongliang and Pan, Jianxiong and Cui, Kaiwen and Lu, Shijian and Ma, Feiying and Xie, Xuansong and Miao, Chunyan},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  year={2021}
}

Acknowledgments

This code borrows heavily from SPADE and minGPT, we apprecite the authors for sharing their codes.

[ACM MM 2021] Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Related tags

Overview

Diverse Image Inpainting with Bidirectional and Autoregressive Transformers

Installation

Dataset Preparation

Testing with Pre-trained Models

Training New Models

Citation

Acknowledgments

Owner

Yingchen Yu

This is a collection of our NAS and Vision Transformer work.

Code for Quantifying Ignorance in Individual-Level Causal-Effect Estimates under Hidden Confounding

Algorithmic Trading using RNN

Automatic Number Plate Recognition using Contours and Convolution Neural Networks (CNN)

Pytorch Implementation of Interaction Networks for Learning about Objects, Relations and Physics

Official implementation of MSR-GCN (ICCV 2021 paper)

Minimal But Practical Image Classifier Pipline Using Pytorch, Finetune on ResNet18, Got 99% Accuracy on Own Small Datasets.

MLSpace: Hassle-free machine learning & deep learning development

A video scene detection algorithm is designed to detect a variety of different scenes within a video

This is the replication package for paper submission: Towards Training Reproducible Deep Learning Models.

Code for the ICCV 2021 paper "Pixel Difference Networks for Efficient Edge Detection" (Oral).

Qlib is an AI-oriented quantitative investment platform

Instance-wise Occlusion and Depth Orders in Natural Scenes (CVPR 2022)

This repo implements several applications of the proposed generalized Bures-Wasserstein (GBW) geometry on symmetric positive definite matrices.

MobileNetV1-V2，MobileNeXt，GhostNet，AdderNet，ShuffleNetV1-V2，Mobile+ViT etc.

Generalized Data Weighting via Class-level Gradient Manipulation

This is the repository for the NeurIPS-21 paper [Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels].

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Attention mechanism with MNIST dataset

Code accompanying "Adaptive Methods for Aggregated Domain Generalization"