MUGE Text To Image Generation Baseline

Requirements and Installation

More details see fairseq. Briefly,

python == 3.6.4
pytorch == 1.7.1

Installing fairseq and other requirements

git clone https://github.com/MUGE-2021/image-caption-baseline
cd muge_baseline/
pip install -r requirements.txt
cd fairseq/
pip install --editable .

Downloading data and place to dataset/ directory, file structure is

text2image-baseline
    - dataset
        - ECommerce-T2I
            - T2I_train.img.tsv
            - T2I_train.text.tsv
            - ...

Getting Started

The model is a BART-like model with vqgan as a image tokenizer, please see models/t2i_baseline.py for detailed model structure.

Training

cd run_scripts/; bash train_t2i_vqgan.sh

Model training takes about 5 hours.

Inference

cd run_scripts/; bash generate_t2i_vqgan.sh

See results in results/ directory.

Reference

@inproceedings{M6,
  author    = {Junyang Lin and
               Rui Men and
               An Yang and
               Chang Zhou and
               Ming Ding and
               Yichang Zhang and
               Peng Wang and
               Ang Wang and
               Le Jiang and
               Xianyan Jia and
               Jie Zhang and
               Jianwei Zhang and
               Xu Zou and
               Zhikang Li and
               Xiaodong Deng and
               Jie Liu and
               Jinbao Xue and
               Huiling Zhou and
               Jianxin Ma and
               Jin Yu and
               Yong Li and
               Wei Lin and
               Jingren Zhou and
               Jie Tang and
               Hongxia Yang},
  title     = {{M6:} {A} Chinese Multimodal Pretrainer},
  year      = {2021},
  booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining},
  pages     = {3251–3261},
  numpages  = {11},
  location  = {Virtual Event, Singapore},
}

@article{M6-T,
  author    = {An Yang and
               Junyang Lin and
               Rui Men and
               Chang Zhou and
               Le Jiang and
               Xianyan Jia and
               Ang Wang and
               Jie Zhang and
               Jiamang Wang and
               Yong Li and
               Di Zhang and
               Wei Lin and
               Lin Qu and
               Jingren Zhou and
               Hongxia Yang},
  title     = {{M6-T:} Exploring Sparse Expert Models and Beyond},
  journal   = {CoRR},
  volume    = {abs/2105.15082},
  year      = {2021}
}

Image-generation-baseline - MUGE Text To Image Generation Baseline

Related tags

Overview

MUGE Text To Image Generation Baseline

Requirements and Installation

Getting Started

Training

Inference

Reference

Owner

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.

School of Artificial Intelligence at the Nanjing University (NJU)School of Artificial Intelligence at the Nanjing University (NJU)

Minimal implementation of Denoised Smoothing: A Provable Defense for Pretrained Classifiers in TensorFlow.

Contour-guided image completion with perceptual grouping (BMVC 2021 publication)

QQ Browser 2021 AI Algorithm Competition Track 1 1st Place Program

a curated list of docker-compose files prepared for testing data engineering tools, databases and open source libraries.

Benchmark VAE - Library for Variational Autoencoder benchmarking

Fully-automated scripts for collecting AI-related papers

This is our ARTS test set, an enriched test set to probe Aspect Robustness of ABSA.

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

An excellent hash algorithm combining classical sponge structure and RNN.

“袋鼯麻麻——智能购物平台”能够精准地定位识别每一个商品

This repository contains the database and code used in the paper Embedding Arithmetic for Text-driven Image Transformation

Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation (CVPR 2022)

Pytorch implementation of RED-SDS (NeurIPS 2021).

Aquarius - Enabling Fast, Scalable, Data-Driven Virtual Network Functions

Submanifold sparse convolutional networks

Element selection for functional materials discovery by integrated machine learning of atomic contributions to properties

Combine Tacotron2 and Hifi GAN to generate speech from text