Combining statistical and geometrical classifiers for text extraction in multispectral document images

  • Authors:
  • Rachid Hedjam;Mohamed Cheriet

  • Affiliations:
  • Synchromedia Lab. for Multemedia Communication in Telepresence, Montreal, Quebec, Canada;Synchromedia Lab. for Multemedia Communication in Telepresence, Montreal, Quebec, Canada

  • Venue:
  • Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Extraction of the original text from historical document images is very important in the preservation of cultural heritage. In recent decades, many image processing techniques have been developed to separate the main text from the document image background, most of which are based on grayscale treatment. In this paper, we propose a new text extraction method designed for multi-spectral document images (MSDI), based on a combination of two classifiers, one statistical and the other geometric. Our main contribution is the novel technique involving feature extraction and classifier weighting in the context of MSDI. The results, which are compared to two binarization methods, are promising.