A design of a preprocessing framework for large database of historical documents

Authors:
Ines Ben Messaoud;Haikal El Abed;Volker Märgner;Hamid Amiri
Affiliations:
Laboratoire des Systèmes et Traitement de Signal, LSTS Ecole Nationale d'Ingénieurs de Tunis, ENIT, Tunis, Tunisia;Technische Universität, Braunschweig, Braunschweig, Germany;Technische Universität, Braunschweig, Braunschweig, Germany;Laboratoire des Systèmes et Traitement de Signal, LSTS, Ecole Nationale d'Ingénieurs de Tunis, ENIT, Tunis, Tunisia
Venue:
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Year:
2011

Citing 15
Cited 1

Thinning Methodologies-A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Extraction of binary character/graphics images from grayscale document images

CVGIP: Graphical Models and Image Processing
Fundamentals of Robotics: Analysis and Control

Fundamentals of Robotics: Analysis and Control
An Introduction to Digital Image Processing

An Introduction to Digital Image Processing
Adaptive degraded document image binarization

Pattern Recognition
An Objective Evaluation Methodology for Document Image Binarization Techniques

DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
ICDAR 2009 Document Image Binarization Contest (DIBCO 2009)

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
An analysis of binarization ground truthing

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Binarization of historical document images using the local maximum and minimum

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Document image binarization using background estimation and stroke edges

International Journal on Document Analysis and Recognition
H-DIBCO 2010 - Handwritten Document Image Binarization Competition

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
ICFHR 2010 Contest: Quantitative Evaluation of Binarization Algorithms

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
Automatic Annotation for Handwritten Historical Documents Using Markov Models

ICFHR '10 Proceedings of the 2010 12th International Conference on Frontiers in Handwriting Recognition
New Binarization Approach Based on Text Block Extraction

ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition

Evaluating glyph binarizations based on their properties

Proceedings of the 2013 ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The objective of document preprocessing is to ease the text recognition or the document indexing processes. The analysis of historical documents seems to be a big challenge because the majority of those documents are noisy and present many degradations. In this paper we propose a preprocessing framework for a large dataset of historical documents. The proposed framework is decomposed of two phases, the selection and the evaluation. During the first phase one or multiple methods are corresponded for each book of the used database. The validation of the selection results is performed during the evaluation. The experiments are applied on printed and handwritten documents extracted respectively from Google-Books and Bayerische Staatsbibliothek databases. The results returned during the evaluation are very promising.