Retrieval from document image collections

  • Authors:
  • A. Balasubramanian;Million Meshesha;C. V. Jawahar

  • Affiliations:
  • Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India;Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India;Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, India

  • Venue:
  • DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTW-based partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process.The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. System-level issues for retrieval (eg. scalability, effective delivery etc.) are addressed in this paper.