Controlling overlap in content-oriented XML retrieval

  • Authors:
  • Charles L. A. Clarke

  • Affiliations:
  • University of Waterloo, Canada

  • Venue:
  • Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The direct application of standard ranking techniques to retrieve individual elements from a collection of XML documents often produces a result set in which the top ranks are dominated by a large number of elements taken from a small number of highly relevant documents. This paper presents and evaluates an algorithm that re-ranks this result set, with the aim of minimizing redundant content while preserving the benefits of element retrieval, including the benefit of identifying topic-focused components contained within relevant documents. The test collection developed by the INitiative for the Evaluation of XML Retrieval (INEX) forms the basis for the evaluation.