TensorFlow Implementation of "Show, Attend and Tell"

Overview

Show, Attend and Tell

Update (December 2, 2016) TensorFlow implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention which introduces an attention based image caption generator. The model changes its attention to the relevant part of the image while it generates each word.


alt text


References

Author's theano code: https://github.com/kelvinxu/arctic-captions

Another tensorflow implementation: https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow


Getting Started

Prerequisites

First, clone this repo and pycocoevalcap in same directory.

$ git clone https://github.com/yunjey/show-attend-and-tell-tensorflow.git
$ git clone https://github.com/tylin/coco-caption.git

This code is written in Python2.7 and requires TensorFlow 1.2. In addition, you need to install a few more packages to process MSCOCO data set. I have provided a script to download the MSCOCO image dataset and VGGNet19 model. Downloading the data may take several hours depending on the network speed. Run commands below then the images will be downloaded in image/ directory and VGGNet19 model will be downloaded in data/ directory.

$ cd show-attend-and-tell-tensorflow
$ pip install -r requirements.txt
$ chmod +x ./download.sh
$ ./download.sh

For feeding the image to the VGGNet, you should resize the MSCOCO image dataset to the fixed size of 224x224. Run command below then resized images will be stored in image/train2014_resized/ and image/val2014_resized/ directory.

$ python resize.py

Before training the model, you have to preprocess the MSCOCO caption dataset. To generate caption dataset and image feature vectors, run command below.

$ python prepro.py

Train the model

To train the image captioning model, run command below.

$ python train.py

(optional) Tensorboard visualization

I have provided a tensorboard visualization for real-time debugging. Open the new terminal, run command below and open http://localhost:6005/ into your web browser.

$ tensorboard --logdir='./log' --port=6005 

Evaluate the model

To generate captions, visualize attention weights and evaluate the model, please see evaluate_model.ipynb.


Results


Training data

(1) Generated caption: A plane flying in the sky with a landing gear down.

alt text

(2) Generated caption: A giraffe and two zebra standing in the field.

alt text

Validation data

(1) Generated caption: A large elephant standing in a dry grass field.

alt text

(2) Generated caption: A baby elephant standing on top of a dirt field.

alt text

Test data

(1) Generated caption: A plane flying over a body of water.

alt text

(2) Generated caption: A zebra standing in the grass near a tree.

alt text

Owner
Yunjey Choi
Yunjey Choi
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
Social Network Ads Prediction

Social network advertising, also social media targeting, is a group of terms that are used to describe forms of online advertising that focus on social networking services.

Khazar 2 Jan 28, 2022
Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Shuwa Gesture Toolkit is a framework that detects and classifies arbitrary gestures in short videos

Google 89 Dec 22, 2022
catch-22: CAnonical Time-series CHaracteristics

catch22 - CAnonical Time-series CHaracteristics About catch22 is a collection of 22 time-series features coded in C that can be run from Python, R, Ma

Carl H Lubba 229 Oct 21, 2022
[NeurIPS'21 Spotlight] PyTorch code for our paper "Aligned Structured Sparsity Learning for Efficient Image Super-Resolution"

ASSL This repository is for a new network pruning method (Aligned Structured Sparsity Learning, ASSL) for efficient single image super-resolution (SR)

Huan Wang 47 Nov 28, 2022
History Aware Multimodal Transformer for Vision-and-Language Navigation

History Aware Multimodal Transformer for Vision-and-Language Navigation This repository is the official implementation of History Aware Multimodal Tra

Shizhe Chen 46 Nov 23, 2022
Python Library for Signal/Image Data Analysis with Transport Methods

PyTransKit Python Transport Based Signal Processing Toolkit Website and documentation: https://pytranskit.readthedocs.io/ Installation The library cou

24 Dec 23, 2022
A high performance implementation of HDBSCAN clustering.

HDBSCAN HDBSCAN - Hierarchical Density-Based Spatial Clustering of Applications with Noise. Performs DBSCAN over varying epsilon values and integrates

2.3k Jan 02, 2023
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

Yue Gao 102 Jan 02, 2023
Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study

Optimal Adaptive Allocation using Deep Reinforcement Learning in a Dose-Response Study Supplementary Materials for Kentaro Matsuura, Junya Honda, Imad

Kentaro Matsuura 4 Nov 01, 2022
Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19 (Oral).

Pose-Transfer Code for the paper Progressive Pose Attention for Person Image Generation in CVPR19(Oral). The paper is available here. Video generation

Tengteng Huang 679 Jan 04, 2023
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 03, 2023
Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

Yunzhong Hou 80 Dec 25, 2022
SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021]

SSD: A Unified Framework for Self-Supervised Outlier Detection [ICLR 2021] Pdf: https://openreview.net/forum?id=v5gjXpmR8J Code for our ICLR 2021 pape

Princeton INSPIRE Research Group 113 Nov 27, 2022
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

WebDataset WebDataset is a PyTorch Dataset (IterableDataset) implementation providing efficient access to datasets stored in POSIX tar archives and us

1.1k Jan 08, 2023
[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

Rethinking Minimal Sufficient Representation in Contrastive Learning PyTorch implementation of Rethinking Minimal Sufficient Representation in Contras

36 Nov 23, 2022
Frigate - NVR With Realtime Object Detection for IP Cameras

A complete and local NVR designed for HomeAssistant with AI object detection. Uses OpenCV and Tensorflow to perform realtime object detection locally for IP cameras.

Blake Blackshear 6.4k Dec 31, 2022
Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data

FTLNet_Pytorch Pytorch codes for Feature Transfer Learning for Face Recognition with Under-Represented Data 1. Introduction This repo is an unofficial

1 Nov 04, 2020
some academic posters as references. May we have in-person poster session soon!

some academic posters as references. May we have in-person poster session soon!

Bolei Zhou 472 Jan 06, 2023
Code for "Universal inference meets random projections: a scalable test for log-concavity"

How to use this repository This repository contains code to replicate the results of "Universal inference meets random projections: a scalable test fo

Robin Dunn 0 Nov 21, 2021