The Accessibility Dimension for Structured Document Retrieval

  • Authors:
  • Thomas Rölleke;Mounia Lalmas;Gabriella Kazai;Ian Ruthven;Stefan Quicker

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.