Page to PAGE Layout Analysis Tool

Last update: Nov 24, 2022

Overview

P2PaLA

Page to PAGE Layout Analysis (P2PaLA) is a toolkit for Document Layout Analysis based on Neural Networks.

💥 Try our new DEMO for online baseline detection. ❗ ❗

If you find this toolkit useful in your research, please cite:

@misc{p2pala2017,
  author = {Lorenzo Quirós},
  title = {P2PaLA: Page to PAGE Layout Analysis tookit},
  year = {2017},
  publisher = {GitHub},
  note = {GitHub repository},
  howpublished = {\url{https://github.com/lquirosd/P2PaLA}},
}

Check this paper for more details Arxiv.

Requirements

Linux (OSX may work, but untested.).
Python (2.7, 3.6 under conda virtual environment is recomended)
Numpy
PyTorch (1.0). PyTorch 0.3.1 compatible on this branch
OpenCv (3.4.5.20).
NVIDIA GPU + CUDA CuDNN (CPU mode and CUDA without CuDNN works, but is not recomended for training).
tensorboard-pytorch (v0.9) [Optional]. pip install tensorboardX > A diferent conda env is recomended to keep tensorflow separated from PyTorch

Install

python setup.py install

To install python dependencies alone, use requirements file conda env create --file conda_requirements.yml

Usage

Input data must follow the folder structure data_tag/page, where images must be into the data_tag folder and xml files into page. For example:

mkdir -p data/{train,val,test,prod}/page;
tree data;

data
├── prod
│   ├── page
│   │   ├── prod_0.xml
│   │   └── prod_1.xml
│   ├── prod_0.jpg
│   └── prod_1.jpg
├── test
│   ├── page
│   │   ├── test_0.xml
│   │   └── test_1.xml
│   ├── test_0.jpg
│   └── test_1.jpg
├── train
│   ├── page
│   │   ├── train_0.xml
│   │   └── train_1.xml
│   ├── train_0.jpg
│   └── train_1.jpg
└── val
    ├── page
    │   ├── val_0.xml
    │   └── val_1.xml
    ├── val_0.jpg
    └── val_1.jpg

Run the tool.

python P2PaLA.py --config config.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"

❗ Pre-trained models available here

Use TensorBoard to visualize train status:

tensorboard --logdir ./work/runs

xml-PAGE files must be at "./work/results/test/"

We recommend Transkribus or nw-page-editor to visualize and edit PAGE-xml files.

For detail about arguments and config file, see docs or python P2PaLa.py -h.
For more detailed example see egs:
- Bozen dataset see
- cBAD complex competition dataset see
- OHG dataset see

License

GNU General Public License v3.0 See LICENSE to see the full text.

Acknowledgments

Code is inspired by pix2pix and pytorch-CycleGAN-and-pix2pix

Page to PAGE Layout Analysis Tool

Related tags

Overview

P2PaLA

Requirements

Install

Usage

License

Acknowledgments

Owner

Lorenzo Quirós Díaz

Scene text recognition

A python screen recorder for low-end computers, provides high quality video output.

Repository collecting all the submodules for the new PyTorch-based OCR System.

OCR software for recognition of handwritten text

Color Picker and Color Detection tool for METR4202

FastOCR is a desktop application for OCR API.

A list of hyperspectral image super-solution resources collected by Junjun Jiang

Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition:

零样本学习测评基准，中文版

This is a tensorflow re-implementation of PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network.My blog:

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

Aloception is a set of package for computer vision: aloscene, alodataset, alonet.

Deskewing images with slanted content

OCR system for Arabic language that converts images of typed text to machine-encoded text.

Autonomous Driving project for Euro Truck Simulator 2

Primary QPDF source code and documentation

[ICCV, 2021] Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks

Automatically resolve RidderMaster based on TensorFlow & OpenCV

An application of high resolution GANs to dewarp images of perturbed documents