Amharic document image retrieval using morphological coding

  • Authors:
  • Tilahun Yeshambel;Yaregal Assabie

  • Affiliations:
  • University of Gondar, Gondar, Ethiopia;Addis Ababa University, Addis Ababa, Ethiopia

  • Venue:
  • Proceedings of the International Conference on Management of Emergent Digital EcoSystems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel approach to Amharic document image retrieval by taking the morphology of the language into account. In addition to the general problems and issues concerning document image retrieval systems, Amharic poses further difficulties in modeling retrieval systems due to its complex morphology. We encode the morphological characteristics of the language to improve query formulation and image database indexing. In this work, morphological generator is used to automatically synthesize surface words from a lexicon containing Amharic root forms resulting in surface word image features coded with their respective root forms. Using this morphological coding, document word images and query terms are processed to be represented by their root forms. In the process of indexing and query formulation, cosine similarity is used for comparing word image features extracted from vertical projection, upper bound profile and lower bound profile. The proposed system is tested by using real-life Amharic documents collected from various sources and experimental results are reported.