A multi-level matching method with hybrid similarity for document retrieval

  • Authors:
  • Haijun Zhang;Tommy W. S. Chow

  • Affiliations:
  • Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 12.05

Visualization

Abstract

This paper presents a multi-level matching method for document retrieval (DR) using a hybrid document similarity. Documents are represented by multi-level structure including document level and paragraph level. This multi-level-structured representation is designed to model underlying semantics in a more flexible and accurate way that the conventional flat term histograms find it hard to cope with. The matching between documents is then transformed into an optimization problem with Earth Mover's Distance (EMD). A hybrid similarity is used to synthesize the global and local semantics in documents to improve the retrieval accuracy. In this paper, we have performed extensive experimental study and verification. The results suggest that the proposed method works well for lengthy documents with evident spatial distributions of terms.