This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.

Overview

This is an oriented object detector based on tensorflow object detection API. Most of the code is not changed except for those related to the need of predicinting oriented bounding boxes rather than regular horizontal bounding boxes.

Many tasks need to predict an oriented bounding box, e.g: Scene Text Detection. Check out the detection results: (Note that this code doesn't train model to recognize text. Only the bounding boxes are predicted)

Goals

For each predicted bounding boxes, in addition to the regular horizontal bounding box, we need to predict one oriented bounding box. Basically it means that we need to regress to an oriented bounding box. In this project, we simply regress to the encoded 4 corners of the oriented bounding boxes(8 values). See below equation for the encoding function. j is the index for each corner. g represents ground truth oriented bounding boxes. w_a and h_a is the anchor width and height, respectively.

The reason of adopting this Faster RCNN/SSD framework:

There are many object detection framework to be used. We adopt this one as the basis for the following reasons:

Highly modular designed code

It's easy to change the encoding scheme in the code. Simply changing the code in box_coders folder. The encoding using [R2CNN] (https://arxiv.org/abs/1706.09579) will be released soon. Training model with faster rcnn or ssd is easy to modify.

Natural integration with slim nets

It's easy to change feature extraction CNN backbone by using slim nets.

Easy and clear configuration setting with google protobuf

Changing the network configuration setting is easy. For example, to change the different aspect ratios of the anchors used, simply changing the grid_anchor_generator in the configuration file.

Many supporting codes have been provided.

It provides many supporting code such as exporting the trained model to a frozen graph that can be used in production(For example, in your c++ project). Check out my another project DeepSceneTextReader which used the frozen graph trained with this code.

Code Changed compared to the original object detection implementation

Import path for each python file

You do not need to use blaze build to build the code. Simply run the code from the root directory for fast experiment.

proto files

added oriented related filed to the proto files. Please build them with

protoc protos/*.proto --python_out=.

Box encoding scheme

added code for encode and decode oriented bounding boxes

Added code in meta architecture for supporting oriented bounding box prediction

Add code to predict the oriented bounding boxes for each proposal. At the same time the add code to calculate the oriented bounding boxes regression loss.

Other changes regarding data reading, data decoding and others

Usage:

Create the tfrecord data

Use the code create_text_dataset.py to create the tfexample data files used for training. You can create ICDAR 2015 and ICDAR 2013 data for training.

Download the pretrained weight

If you are training faster rcnn inception resnet v2 model, you can download the pretrained weight from tensorflow model zoo.

change the specific configuration setting.

See data/faster_rcnn_inception_resnet_v2_atrous_text.config for example configuration The parameter: second_stage_localization_loss_weight_oriented is the weight for the oriented bounding box prediction.

Train the model

Example running script is provided: train_faster_rcnn_inception_resnet_v2.sh

Evaluation

Trained with default configuration with ResNet Inception V2 or ResNet 101 backbone on ICDAR 2013 + ICDAR 2015 training set. The performance on ICDAR 2015 dataset.

Backbone Recall Precision F-1
ResNet Inception V2 0.7371 0.8057 0.7699
ResNet 101 0.6861 0.8213 0.7476

To improve the performance, try changing the configuration settings. Many scene text detectors have more aspect ratios anchors for each location than that was used for regular object detection.

TODO

  1. Provide support for R2CNN training.

Reference and Related Projects

Contact:

Owner
Dafang He
Ph.D student at the Penn State University. Focusing on machine learning and computer vision.
Dafang He
The world's simplest facial recognition api for Python and the command line

Face Recognition You can also read a translated version of this file in Chinese 简体中文版 or in Korean 한국어 or in Japanese 日本語. Recognize and manipulate fa

Adam Geitgey 47k Jan 07, 2023
Détection de créneaux de vaccination disponibles pour l'outil ViteMaDose

Vite Ma Dose ! est un outil open source de CovidTracker permettant de détecter les rendez-vous disponibles dans votre département afin de vous faire v

CovidTracker 239 Dec 13, 2022
Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels"

Reference Code for AAAI-20 paper "Multi-Stage Self-Supervised Learning for Graph Convolutional Networks on Graphs with Few Labels" Please refer to htt

Ke Sun 1 Feb 14, 2022
scene-linear test images

Scene-Referred Image Collection A collection of OpenEXR Scene-Referred images, encoded as max 2048px width, DWAA 80 compression. All exrs are encoded

Gralk Klorggson 7 Aug 25, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
✌️Using this you can control your PC/Laptop volume by Hand Gestures created with Python.

Hand Gesture Volume Controller ✋ Hand recognition 👆 Finger recognition 🔊 you can decrease and increase volume Demo Code Firstly I have created a Mod

Abbas Ataei 19 Nov 17, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
Steve Tu 71 Dec 30, 2022
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
pyntcloud is a Python library for working with 3D point clouds.

pyntcloud is a Python library for working with 3D point clouds.

David de la Iglesia Castro 1.2k Jan 07, 2023
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Fully-automated scripts for collecting AI-related papers

AI-Paper-Collector Web demo: https://ai-paper-collector.vercel.app/ (recommended) Colab notebook: here Motivation Fully-automated scripts for collecti

772 Dec 30, 2022
Framework for the Complete Gaze Tracking Pipeline

Framework for the Complete Gaze Tracking Pipeline The figure below shows a general representation of the camera-to-screen gaze tracking pipeline [1].

Pascal 20 Jan 06, 2023
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 02, 2023
The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

Mask TextSpotter A Pytorch implementation of Mask TextSpotter along with its extension can be find here Introduction This is the official implementati

Pengyuan Lyu 261 Nov 21, 2022
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

8 Nov 19, 2022
This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network Introduction This is a tensorflow re-implementation of PSENet: Shape Robu

Michael liu 498 Dec 30, 2022
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

3 Sep 17, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
Motion Detection Squid Game with OpenCV Python

*Motion Detection Squid Game with OpenCV Python i am newbie in python. In this project I made a simple game to follow the trend about the red light gr

Nayan 17 Nov 22, 2022