Matching word images for content-based retrieval from printed document images

  • Authors:
  • Million Meshesha;C. V. Jawahar

  • Affiliations:
  • International Institute of Information Technology, Center for Visual Information Technology, 500 032, Hyderabad, India;International Institute of Information Technology, Center for Visual Information Technology, 500 032, Hyderabad, India

  • Venue:
  • International Journal on Document Analysis and Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

As large quantity of document images is getting archived by the digital libraries, there is a need for an efficient search strategies to make them available as per users information need. In this paper, we propose an effective word image matching scheme that achieves high performance in the presence of script variability, printing variation, degradation and word-form variants. A novel partial matching algorithm is designed for morphological matching of word form variants in a language. We formulate feature extraction scheme that extracts local features by scanning vertical strips of the word image and combining them automatically based on their discriminatory potential. We present detailed performance analysis of the proposed approach on English, Amharic and Hindi documents.