Analysis and recognition of highly degraded printed characters

  • Authors:
  • Anna Tonazzini;Stefano Vezzosi;Luigi Bedini

  • Affiliations:
  • Istituto di Scienza e Tecnologie dell’Informazione, Area della Ricerca CNR di Pisa, Via G. Moruzzi, 1, 56124, Pisa, Italy;Istituto di Scienza e Tecnologie dell’Informazione, Area della Ricerca CNR di Pisa, Via G. Moruzzi, 1, 56124, Pisa, Italy;Istituto di Scienza e Tecnologie dell’Informazione, Area della Ricerca CNR di Pisa, Via G. Moruzzi, 1, 56124, Pisa, Italy

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an integrated system for the processing and analysis of highly degraded printed documents for the purpose of recognizing text characters. As a case study, ancient printed texts are considered. The system is comprised of various blocks operating sequentially. Starting with a single page of the document, the background noise is reduced by wavelet-based decomposition and filtering, the text lines are detected, extracted, and segmented by a simple and fast adaptive thresholding into blobs corresponding to characters, and the various blobs are analyzed by a feedforward multilayer neural network trained with a back-propagation algorithm. For each character, the probability associated with the recognition is then used as a discriminating parameter that determines the automatic activation of a feedback process, leading the system back to a block for refining segmentation. This block acts only on the small portions of the text where the recognition cannot be relied on and makes use of blind deconvolution and MRF-based segmentation techniques whose high complexity is greatly reduced when applied to a few subimages of small size. The experimental results highlight that the proposed system performs a very precise segmentation of the characters and then a highly effective recognition of even strongly degraded texts.