Recent progress on the OCRopus OCR system

Authors:
Thomas Breuel
Affiliations:
U. Kaiserslautern and DFKI
Venue:
Proceedings of the International Workshop on Multilingual OCR
Year:
2009

Citing 7
Cited 3

Advances in the BBN BYBLOS OCR System

ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Classification Using a Hierarchical Bayesian Approach

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Segmentation of Handprinted Letter Strings Using a Dynamic Programming Algorithm

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Style consistency in pattern fields

Style consistency in pattern fields
Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
Artificial Neural Networks for Document Analysis and Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Overview of the Tesseract OCR Engine

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02

Unsupervised font reconstruction based on token co-occurrence

Proceedings of the 10th ACM symposium on Document engineering
An experimental workflow development platform for historical document digitisation and analysis

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Ocropodium: open source OCR for small-scale historical archives

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The OCRopus system is an open source OCR system developed for book capture and digital library applications. It is designed to be a multilingual system in which all components are easily pluggable and replaceable. In this paper, I describe recent progress, on-going work, and preliminary results in the development of the OCRopus system, including the new component model, a new line recognizer, a new set of decoders, and language modeling tools.