Content and structure in indexing and ranking XML

  • Authors:
  • Felix Weigel;Holger Meuss;Klaus U. Schulz;François Bry

  • Affiliations:
  • University of Munich (LMU), Munich;European Southern Observatory, Headquarter Garching, Garching;University of Munich (LMU), Munich;University of Munich (LMU), Munich

  • Venue:
  • Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rooted in electronic publishing, XML is now widely used for modelling and storing structured text documents. Especially in the WWW, retrieval of XML documents is most useful in combination with a relevance-based ranking of the query result. Index structures with ranking support are therefore needed for fast access to relevant parts of large document collections. This paper proposes a classification scheme for both XML ranking models and index structures, allowing to determine which index suits which ranking model. An analysis reveals that ranking parameters related to both the content and structure of the data are poorly supported by most known XML indices. The IR-CADG index, owing to its tight integration of content and structure, supports various XML ranking models in a very efficient retrieval process. Experiments show that it outperforms separate content/structure indexing by more than two orders of magnitude for large corpora of several hundred MB.