Implementation of EAST scene text detector in Keras

Last update: Nov 15, 2022

Overview

EAST: An Efficient and Accurate Scene Text Detector

This is a Keras implementation of EAST based on a Tensorflow implementation made by argman.

The original paper by Zhou et al. is available on arxiv.

Only RBOX geometry is implemented
Differences from the original paper
- Uses ResNet-50 instead of PVANet
- Uses dice loss function instead of balanced binary cross-entropy
- Uses AdamW optimizer instead of the original Adam

The implementation of AdamW optimizer is borrowed from this repository.

The code should run under both Python 2 and Python 3.

Requirements

Keras 2.0 or higher, and TensorFlow 1.0 or higher should be enough.

The code should run with Keras 2.1.5. If you use Keras 2.2 or higher, you have to remove ZeroPadding2D from the model.py file. Specifically, replace the line containing ZeroPadding2D with x = concatenate([x, resnet.get_layer('activation_10').output], axis=3).

I will add a list of packages and their versions under which no errors should occur later.

Data

You can use your own data, but the annotation files need to conform the ICDAR 2015 format.

ICDAR 2015 dataset can be downloaded from this site. You need the data from Task 4.1 Text Localization.
You can also download the MLT dataset, which uses the same annotation style as ICDAR 2015, there.

Alternatively, you can download a training dataset consisting of all training images from ICDAR 2015 and ICDAR 2013 datasets with annotation files in ICDAR 2015 format here.
You can also get a subset of validation images from the MLT 2017 dataset containing only images with text in the Latin alphabet for validation here.
The original datasets are distributed by the organizers of the Robust Reading Competition and are licensed under the CC BY 4.0 license.

Training

You need to put all of your training images and their corresponding annotation files in one directory. The annotation files have to be named gt_IMAGENAME.txt.
You also need a directory for validation data, which requires the same structure as the directory with training images.

Training is started by running train.py. It accepts several arguments including path to training and validation data, and path where you want to save trained checkpoint models. You can see all of the arguments you can specify in the train.py file.

Execution example

python train.py --gpu_list=0,1 --input_size=512 --batch_size=12 --nb_workers=6 --training_data_path=../data/ICDAR2015/train_data/ --validation_data_path=../data/MLT/val_data_latin/ --checkpoint_path=tmp/icdar2015_east_resnet50/

You can download a model trained on ICDAR 2015 and 2013 here. It achieves 0.802 F-score on ICDAR 2015 test set. You also need to download this JSON file of the model to be able to use it.

Test

The images you want to classify have to be in one directory, whose path you have to pass as an argument. Classification is started by running eval.py with arguments specifying path to the images to be classified, the trained model, and a directory which you want to save the output in.

Execution example

python eval.py --gpu_list=0 --test_data_path=../data/ICDAR2015/test/ --model_path=tmp/icdar2015_east_resnet50/model_XXX.h5 --output_dir=tmp/icdar2015_east_resnet50/eval/

Implementation of EAST scene text detector in Keras

Related tags

Overview

EAST: An Efficient and Accurate Scene Text Detector

Requirements

Data

Training

Execution example

Test

Execution example

Detection examples

Owner

Jan Zdenek

An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

A program that takes in the hand gesture displayed by the user and translates ASL.

Code for CVPR 2022 paper "Bailando: 3D dance generation via Actor-Critic GPT with Choreographic Memory"

A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

[EMNLP 2021] Improving and Simplifying Pattern Exploiting Training

This project is basically to draw lines with your hand, using python, opencv, mediapipe.

An official PyTorch implementation of the paper "Learning by Aligning: Visible-Infrared Person Re-identification using Cross-Modal Correspondences", ICCV 2021.

Automatically remove the mosaics in images and videos, or add mosaics to them.

[BMVC'21] Official PyTorch Implementation of Grounded Situation Recognition with Transformers

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

一键翻译各类图片内文字

Python-based tools for document analysis and OCR

Official code for "Bridging Video-text Retrieval with Multiple Choice Questions", CVPR 2022 (Oral).

Autonomous Driving project for Euro Truck Simulator 2

A facial recognition program that plays a alarm (mp3 file) when a person i seen in the room. A basic theif using Python and OpenCV

Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.

An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come

第一届西安交通大学人工智能实践大赛（2018AI实践大赛--图片文字识别）第一名；仅采用densenet识别图中文字