Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

Last update: Sep 08, 2022

Related tags

Overview

OCR Ground Truth for Historical Commentaries

The dataset OCR ground truth for historical commentaries (GT4HistComment) was created from the public domain subset of scholarly commentaries on Sophocles' Ajax. Its main goal is to enable the evaluation of the OCR quality on printed materials that contain a mix of Latin and polytonic Greek scripts. It consists of five 19C commentaries written in German, English, and Latin, for a total of 3,356 GT lines.

Data

GT4HistComment are contained in data/, where each sub-folder corresponds to a different publication (i.e. commentary). For each each commentary we provide the following data:

<commentary_id>/GT-pairs: pairs of image/text files for each GT line
<commentary_id>/imgs: original images on which the OCR was performed
<commentary_id>/<commentary_id>_olr.tsv: OLR annotations with image region coordinates and layout type ground truth label

The OCR output produced by the Kraken + Ciaconna pipeline was manually corrected by a pool of annotators using the Lace platform. In order to ensure the quality of the ground truth datasets, an additional verification of all transcriptions made in Lace was carried out by an annotator on line-by-line pairs of image and corresponding text.

Commentary overview

ID	Commentator	Year	Languages	Image source
bsb10234118	Lobeck [1]	1835	Greek, Latin	BSB
sophokle1v3soph	Schneidewin [2]	1853	Greek, German	Internet Archive
cu31924087948174	Campbell [3]	1881	Greek, English	Internet Archive
sophoclesplaysa05campgoog	Jebb [4]	1896	Greek, English	Internet Archive
Wecklein1894	Wecklein [5]	1894 [5]	Greek. German	internal

Stats

Line, word and char counts for each commentary are indicated in the following table. Detailled counts for each region can be found here.

ID	Commentator	Type	lines	words	all chars	greek chars
bsb10234118	Lobeck	training	574	2943	16081	5344
bsb10234118	Lobeck	groundtruth	202	1491	7917	2786
sophokle1v3soph	Schneidewin	training	583	2970	16112	3269
sophokle1v3soph	Schneidewin	groundtruth	382	1599	8436	2191
cu31924087948174	Campbell	groundtruth	464	2987	14291	3566
sophoclesplaysa05campgoog	Jebb	training	561	4102	19141	5314
sophoclesplaysa05campgoog	Jebb	groundtruth	324	2418	10986	2805
Wecklein1894	Wecklein	groundtruth	211	1912	9556	3268

Commentary editions used:

[1] Lobeck, Christian August. 1835. Sophoclis Aiax. Leipzig: Weidmann.
[2] Sophokles. 1853. Sophokles Erklaert von F. W. Schneidewin. Erstes Baendchen: Aias. Philoktetes. Edited by Friedrich Wilhelm Schneidewin. Leipzig: Weidmann.
[3] Lewis Campbell. 1881. Sophocles. Oxford : Clarendon Press.
[4] Wecklein, Nikolaus. 1894. Sophokleus Aias. München: Lindauer.
[5] Jebb, Richard Claverhouse. 1896. Sophocles: The Plays and Fragments. London: Cambridge University Press.

Citation

If you use this dataset in your research, please cite the following publication:

@inproceedings{romanello_optical_2021,
  title = {Optical {{Character Recognition}} of 19th {{Century Classical Commentaries}}: The {{Current State}} of {{Affairs}}},
  booktitle = {The 6th {{International Workshop}} on {{Historical Document Imaging}} and {{Processing}} ({{HIP}} '21)},
  author = {Romanello, Matteo and Sven, Najem-Meyer and Robertson, Bruce},
  year = {2021},
  publisher = {{Association for Computing Machinery}},
  address = {{Lausanne}},
  doi = {10.1145/3476887.3476911}
}

Acknowledgements

Data in this repository were produced in the context of the Ajax Multi-Commentary project, funded by the Swiss National Science Foundation under an Ambizione grant PZ00P1_186033.

Contributors: Carla Amaya (UNIL), Sven Najem-Meyer (EPFL), Matteo Romanello (UNIL), Bruce Robertson (Mount Allison University).

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

Visual 3D Detection Package: This repo aims to provide flexible and reproducible visual 3D detection on KITTI dataset. We expect scripts starting from

305 Dec 19, 2022

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints Official implementation for Reducing Footskate in Human Motion Recon

38 Nov 1, 2022

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

368 Dec 6, 2022

GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale

114 Dec 29, 2022

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python THIS PROJECT IS CURRENTLY A WORK IN PROGRESS AND THUS THIS REPOSITORY I

14 Dec 31, 2022

Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

1 Nov 20, 2021

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

ObjectDrawer-ToolBox is a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system, Object Drawer.

77 Jan 5, 2023

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

PyGAS: Auto-Scaling GNNs in PyG PyGAS is the practical realization of our G NN A uto S cale (GAS) framework, which scales arbitrary message-passing GN

139 Dec 25, 2022

A two-stage U-Net for high-fidelity denoising of historical recordings

A two-stage U-Net for high-fidelity denoising of historical recordings Official repository of the paper (not submitted yet): E. Moliner and V. Välimäk

57 Jan 5, 2023

Comments

adds line-, word- and char-counts to README.md

Adds a table to README.md as suggested by reviewer 1. The table also link to a more complete table, itself a public version of spreadsheet OCR evaluation and stats!detailed_counts. Note that the publishable version is an external reference to our private version, meaning that actualising the latter will also update the former.

opened by sven-nm 0
Pages à exclure - OCR

La page contient les schémas métriques des passages. De ce fait l'OCR ne les reconnaît pas, de plus la correction de l'OCR n'a pas été achevée.

Voici les pages à exclure : sophoclesplaysa05campgoog_0072.png (Jebb, p. 72)

opened by camaya28 0

Ground truth data for the Optical Character Recognition of Historical Classical Commentaries.

Related tags

Overview

OCR Ground Truth for Historical Commentaries

Data

Commentary overview

Stats

Commentary editions used:

Citation

Acknowledgements

You might also like...

Official Repo for Ground-aware Monocular 3D Object Detection for Autonomous Driving

[WACV 2020] Reducing Footskate in Human Motion Reconstruction with Ground Contact Constraints

PointCloud Annotation Tools, support to label object bound box, ground, lane and kerb

GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

Autonomous Ground Vehicle Navigation and Control Simulation Examples in Python

Using LSTM to detect spoofing attacks in an Air-Ground network

ObjectDrawer-ToolBox: a graphical image annotation tool to generate ground plane masks for a 3D object reconstruction system

Implementation of "GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings" in PyTorch

A two-stage U-Net for high-fidelity denoising of historical recordings

Comments

adds line-, word- and char-counts to README.md

Pages à exclure - OCR

Releases(v1.0)

v1.0(Sep 24, 2021)

Owner

Ajax Multi-Commentary

PyTorch implementation of GLOM

magiCARP: Contrastive Authoring+Reviewing Pretraining

Official PyTorch implementation of UACANet: Uncertainty Aware Context Attention for Polyp Segmentation

A PyTorch Toolbox for Face Recognition

The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding, by Chuhan Zhang, Ankush Gupta and Andrew Zisserman.

Official implementation of SIGIR'2021 paper: "Sequential Recommendation with Graph Neural Networks".

I tried to apply the CAM algorithm to YOLOv4 and it worked.

Implementation of ICCV 2021 oral paper -- A Novel Self-Supervised Learning for Gaussian Mixture Model

Code accompanying "Learning What To Do by Simulating the Past", ICLR 2021.

Deep Learning for Human Part Discovery in Images - Chainer implementation

4th place solution for the SIGIR 2021 challenge.

OpenMatch: Open-set Consistency Regularization for Semi-supervised Learning with Outliers (NeurIPS 2021)

Categorizing comments on YouTube into different categories.

phylotorch-bito is a package providing an interface to BITO for phylotorch

Neural Network to colorize grayscale images

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

Self-Supervised Learning of Event-based Optical Flow with Spiking Neural Networks

Unofficial Implementation of RobustSTL: A Robust Seasonal-Trend Decomposition Algorithm for Long Time Series (AAAI 2019)

Implements MLP-Mixer: An all-MLP Architecture for Vision.

MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions