This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Overview

Handwritten Text Recognition (OCR) with MXNet Gluon

These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientist @ Amazon AI, in collaboration with Thomas Delteil who built the original prototype.

Setup

git clone https://github.com/awslabs/handwritten-text-recognition-for-apache-mxnet --recursive

You need to install SCLITE for WER evaluation You can follow the following bash script from this folder:

cd ..
git clone https://github.com/usnistgov/SCTK
cd SCTK
export CXXFLAGS="-std=c++11" && make config
make all
make check
make install
make doc
cd -

You also need hsnwlib

pip install pybind11 numpy setuptools
cd ..
git clone https://github.com/nmslib/hnswlib
cd hnswlib/python_bindings
python setup.py install
cd ../..

if "AssertionError: Please enter credentials for the IAM dataset in credentials.json or as arguments" occurs rename credentials.json.example and to credentials.json with your username and password.

Overview

The pipeline is composed of 3 steps:

The entire inference pipeline can be found in this notebook. See the pretrained models section for the pretrained models.

A recorded talk detailing the approach is available on youtube. [video]

The corresponding slides are available on slideshare. [slides]

Pretrained models:

You can get the models by running python get_models.py:

Sample results

The greedy, lexicon search, and beam search outputs present similar and reasonable predictions for the selected examples. In Figure 6, interesting examples are presented. The first line of Figure 6 show cases where the lexicon search algorithm provided fixes that corrected the words. In the top example, “tovely” (as it was written) was corrected “lovely” and “woved” was corrected to “waved”. In addition, the beam search output corrected “a” into “all”, however it missed a space between “lovely” and “things”. In the second example, “selt” was converted to “salt” with the lexicon search output. However, “selt” was erroneously converted to “self” in the beam search output. Therefore, in this example, beam search performed worse. In the third example, none of the three methods significantly provided comprehensible results. Finally, in the forth example, the lexicon search algorithm incorrectly converted “forhim” into “forum”, however the beam search algorithm correctly identified “for him”.

Dataset:

  • To use test_iam_dataset.ipynb, create credentials.json using credentials.json.example and editing the appropriate field. The username and password can be obtained from http://www.fki.inf.unibe.ch/DBs/iamDB/iLogin/index.php.

  • It is recommended to use an instance with 32GB+ RAM and 100GB disk size, a GPU is also recommended. A p3.2xlarge would be the recommended starter instance on AWS for this project

Appendix

1) Handwritten area

Model architecture

Results

2) Line Detection

Model architecture

Results

3) Handwritten text recognition

Model architecture

Results

Owner
Amazon Web Services - Labs
AWS Labs
Amazon Web Services - Labs
Virtual Zoom Gesture using OpenCV

Virtual_Zoom_Gesture I have created a virtual zoom gesture where we can Zoom in and Zoom out any image and even we can move that image anywhere on the

Mudit Sinha 2 Dec 26, 2021
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021
Using Opencv ,based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching

Using Opencv ,this project is based on Augmental Reality(AR) and will show the feature matching of image and then by finding its matching ,it will just mask that image . This project ,if used in cctv

1 Feb 13, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
Smart computer vision application

Smart-computer-vision-application Backend : opencv and python Library required:

2 Jan 31, 2022
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021
Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

PPE ✨ Repository for our CVPR'2022 paper: Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-

Zipeng Xu 34 Nov 28, 2022
Source Code for AAAI 2022 paper "Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching"

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching This repository is an official implementation of

HKUST-KnowComp 13 Sep 08, 2022
code for our ICCV 2021 paper "DeepCAD: A Deep Generative Network for Computer-Aided Design Models"

DeepCAD This repository provides source code for our paper: DeepCAD: A Deep Generative Network for Computer-Aided Design Models Rundi Wu, Chang Xiao,

Rundi Wu 85 Dec 31, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

27 Jan 08, 2023
Distort a video using Seam Carving (video) and Vibrato effect (sound)

Distort videos Applies a Seam Carving algorithm (aka liquid rescale) on every frame of a video, and a vibrato effect on the audio to distort the video

AlexZeGamer 6 Dec 06, 2022
OCR powered screen-capture tool to capture information instead of images

NormCap OCR powered screen-capture tool to capture information instead of images. Links: Repo | PyPi | Releases | Changelog | FAQs Content: Quickstart

575 Dec 31, 2022
Detect textlines in document images

Textline Detection Detect textlines in document images Introduction This tool performs border, region and textline detection from document image data

QURATOR-SPK 70 Jun 30, 2022
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
Textboxes_plusplus implementation with Tensorflow (python)

TextBoxes++-TensorFlow TextBoxes++ re-implementation using tensorflow. This project is greatly inspired by slim project And many functions are modifie

81 Dec 07, 2022
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023
STEFANN: Scene Text Editor using Font Adaptive Neural Network

STEFANN: Scene Text Editor using Font Adaptive Neural Network @ The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

Prasun Roy 208 Dec 11, 2022
かの有名なあの東方二次創作ソング、「bad apple!」のMVをPythonでやってみたって話

bad apple!! 内容 このプログラムは、bad apple!(feat. nomico)のPVをPythonを用いて再現しよう!という内容です。 実はYoutube並びにGithub上に似たようなプログラムがあったしなんならそっちの方が結構良かったりするんですが、一応公開しますw 使い方 こ

赤紫 8 Jan 05, 2023
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022