A design of a preprocessing framework for large database of historical documents

  • Authors:
  • Ines Ben Messaoud;Haikal El Abed;Volker Märgner;Hamid Amiri

  • Affiliations:
  • Laboratoire des Systèmes et Traitement de Signal, LSTS Ecole Nationale d'Ingénieurs de Tunis, ENIT, Tunis, Tunisia;Technische Universität, Braunschweig, Braunschweig, Germany;Technische Universität, Braunschweig, Braunschweig, Germany;Laboratoire des Systèmes et Traitement de Signal, LSTS, Ecole Nationale d'Ingénieurs de Tunis, ENIT, Tunis, Tunisia

  • Venue:
  • Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The objective of document preprocessing is to ease the text recognition or the document indexing processes. The analysis of historical documents seems to be a big challenge because the majority of those documents are noisy and present many degradations. In this paper we propose a preprocessing framework for a large dataset of historical documents. The proposed framework is decomposed of two phases, the selection and the evaluation. During the first phase one or multiple methods are corresponded for each book of the used database. The validation of the selection results is performed during the evaluation. The experiments are applied on printed and handwritten documents extracted respectively from Google-Books and Bayerische Staatsbibliothek databases. The results returned during the evaluation are very promising.