Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

YOLOv2 in PyTorch

Generative Art Using Neural Visual Grammars and Dual Encoders

Bayesian Meta-Learning Through Variational Gaussian Processes

The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

Creating multimodal multitask models

Make differentially private training of transformers easy for everyone

PyTorch implementation of hand mesh reconstruction described in CMR and MobRecon.

The implementation of the paper "A Deep Feature Aggregation Network for Accurate Indoor Camera Localization".

Code and data for ImageCoDe, a contextual vison-and-language benchmark

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

Neural Logic Inductive Learning

Stroke-predictions-ml-model - Machine learning model to predict individuals chances of having a stroke

Predict Breast Cancer Wisconsin (Diagnostic) using Naive Bayes

Context Axial Reverse Attention Network for Small Medical Objects Segmentation

Predict the latency time of the deep learning models

Reinforcement Learning via Supervised Learning

SweiNet is an uncertainty-quantifying shear wave speed (SWS) estimator for ultrasound shear wave elasticity (SWE) imaging.

Data and Code for paper Outlining and Filling: Hierarchical Query Graph Generation for Answering Complex Questions over Knowledge Graph is available for research purposes.

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)