Automatic Feature Selection with Applications to Script Identification of Degraded Documents

  • Authors:
  • Vitaly Ablavsky;Mark R. Stevens

  • Affiliations:
  • -;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current approaches to script identification rely onhand-selected features and often require processing a significantpart of the document to achieve reliable identification.We present an approach that applies a large pool ofimage features to a small training sample and uses subsetfeature selection techniques to automatically select a subsetwith the most discriminating power. At run time we usea classifier coupled with an evidence accumulation engineto report a script label once a preset likelihood thresholdhas been reached. We apply the system to a diverse corpusof printed Russian and English documents that suffer fromcommon degradation problems. Our validation studyshows promising results both in terms of the script identificationaccuracy and the ability to identify script on thescale of individual words and text lines.