This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Overview

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of predicinting oriented bounding boxes rather than regular horizontal bounding boxes.

Many tasks need to predict an oriented bounding box, e.g: Scene Text Detection. Check out the detection results: (Note that this code doesn't train model to recognize text. Only the bounding boxes are predicted)

Goals

For each predicted bounding boxes, in addition to the regular horizontal bounding box, we need to predict one oriented bounding box. Basically it means that we need to regress to an oriented bounding box. In this project, we simply regress to the encoded 4 corners of the oriented bounding boxes(8 values). See below equation for the encoding function. j is the index for each corner. g represents ground truth oriented bounding boxes. w_a and h_a is the anchor width and height, respectively.

The reason of adopting this Faster RCNN/SSD framework:

There are many object detection framework to be used. We adopt this one as the basis for the following reasons:

Highly modular designed code

It's easy to change the encoding scheme in the code. Simply changing the code in box_coders folder. The encoding using [R2CNN] (https://arxiv.org/abs/1706.09579) will be released soon. Training model with faster rcnn or ssd is easy to modify.

Natural integration with slim nets

It's easy to change feature extraction CNN backbone by using slim nets.

Easy and clear configuration setting with google protobuf

Changing the network configuration setting is easy. For example, to change the different aspect ratios of the anchors used, simply changing the grid_anchor_generator in the configuration file.

Many supporting codes have been provided.

It provides many supporting code such as exporting the trained model to a frozen graph that can be used in production(For example, in your c++ project). Check out my another project DeepSceneTextReader which used the frozen graph trained with this code.

Code Changed compared to the original object detection implementation

Import path for each python file

You do not need to use blaze build to build the code. Simply run the code from the root directory for fast experiment.

proto files

added oriented related filed to the proto files. Please build them with

protoc protos/*.proto --python_out=.

Box encoding scheme

added code for encode and decode oriented bounding boxes

Added code in meta architecture for supporting oriented bounding box prediction

Add code to predict the oriented bounding boxes for each proposal. At the same time the add code to calculate the oriented bounding boxes regression loss.

Other changes regarding data reading, data decoding and others

Usage:

Create the tfrecord data

Use the code create_text_dataset.py to create the tfexample data files used for training. You can create ICDAR 2015 and ICDAR 2013 data for training.

Download the pretrained weight

If you are training faster rcnn inception resnet v2 model, you can download the pretrained weight from tensorflow model zoo.

change the specific configuration setting.

See data/faster_rcnn_inception_resnet_v2_atrous_text.config for example configuration The parameter: second_stage_localization_loss_weight_oriented is the weight for the oriented bounding box prediction.

Train the model

Example running script is provided: train_faster_rcnn_inception_resnet_v2.sh

Evaluation

Trained with default configuration with ResNet Inception V2 or ResNet 101 backbone on ICDAR 2013 + ICDAR 2015 training set. The performance on ICDAR 2015 dataset.

Backbone Recall Precision F-1
ResNet Inception V2 0.7371 0.8057 0.7699
ResNet 101 0.6861 0.8213 0.7476

To improve the performance, try changing the configuration settings. Many scene text detectors have more aspect ratios anchors for each location than that was used for regular object detection.

TODO

  1. Provide support for R2CNN training.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
This is a real life mario project using python and mediapipe

real-life-mario This is a real life mario project using python and mediapipe How to run to run this just run - realMario.py file requirements This req

Programminghut 42 Dec 22, 2022
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

8 Jul 29, 2022
Scene text recognition

AttentionOCR for Arbitrary-Shaped Scene Text Recognition Introduction This is the ranked No.1 tensorflow based scene text spotting algorithm on ICDAR2

777 Jan 09, 2023
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 05, 2023
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
零样本学习测评基准,中文版

ZeroCLUE 零样本学习测评基准,中文版 零样本学习是AI识别方法之一。 简单来说就是识别从未见过的数据类别,即训练的分类器不仅仅能够识别出训练集中已有的数据类别, 还可以对于来自未见过的类别的数据进行区分。 这是一个很有用的功能,使得计算机能够具有知识迁移的能力,并无需任何训练数据, 很符合现

CLUE benchmark 27 Dec 10, 2022
Repository for playing the computer vision apps: People analytics on Raspberry Pi.

play-with-torch Repository for playing the computer vision apps: People analytics on Raspberry Pi. Tools Tested Hardware RasberryPi 4 Model B here, RA

eMHa 1 Sep 23, 2021
Camelot: PDF Table Extraction for Humans

Camelot: PDF Table Extraction for Humans Camelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can als

Atlan Technologies Pvt Ltd 3.3k Dec 31, 2022
BoxToolBox is a simple python application built around the openCV library

BoxToolBox is a simple python application built around the openCV library. It is not a full featured application to guide you through the w

František Horínek 1 Nov 12, 2021
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 07, 2023
PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV)

About PyQT5 app that colorize black & white pictures using CNN(use pre-trained model which was made with OpenCV) Colorizor Приложение для проекта Yand

1 Apr 04, 2022
The CIS OCR PostCorrectionTool

The CIS OCR Post Correction Tool PoCoTo Source code for the Java-based PoCoTo client enabling fast interactive batch corrections of complete OCR error

CIS OCR Group 36 Dec 15, 2022
A curated list of promising OCR resources

Call for contributor(paper summary,dataset generation,algorithm implementation and any other useful resources) awesome-ocr A curated list of promising

wanghaisheng 1.6k Jan 04, 2023
A python screen recorder for low-end computers, provides high quality video output.

RecorderX - v1.0 A screen recorder made in Python with the help of OpenCv, it has ability to record your screen in high quality. No matter what your P

Priyanshu Jindal 4 Nov 10, 2021
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 03, 2023
Generates a message from the infamous Jerma Impostor image

Generate your very own jerma sus imposter message. Modes: Default Mode: Only supports the characters " ", !, a, b, c, d, e, h, i, m, n, o, p, q, r, s,

Giorno420 1 Oct 27, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
A real-time dolly zoom camera effect

Dolly-Zoom I've always been amazed by the gradual perspective change of dolly zoom, and I have some experience in python and OpenCV, so I decided to c

Dylan Kai Lau 52 Dec 08, 2022
Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

SA-AutoAug Scale-aware Automatic Augmentation for Object Detection Yukang Chen, Yanwei Li, Tao Kong, Lu Qi, Ruihang Chu, Lei Li, Jiaya Jia [Paper] [Bi

Jia Research Lab 182 Dec 29, 2022