Document Image Retrieval Based on 2D Density Distributions of Terms with Pseudo Relevance Feedback

  • Authors:
  • Koichi Kise;Yin Wuotang;Keinosuke Matsumoto

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document image retrieval is a task to retrieve documentimages relevant to a user's query. Most of existing methodsbased on word-level indexing rely on the representationcalled "bag of words" which originated in the field of informationretrieval. This paper presents a new representationof documents that utilizes additional information about thelocation of words in pages so as to improve the retrieval performance.We consider that pages are relevant to a queryif they contains its terms densely. This notion is embodiedas density distributions of terms calculated in the proposedmethod. Its performance is improved with the helpof "pseudo relevance feedback", i.e., a method of expandinga query by analyzing pages. Experimental results onEnglish document images show that the proposed methodis superior to conventional methods of electronic documentretrieval at recall levels 0.0-0.6.