Searching and Browsing Collections of Structural Information

  • Authors:
  • Jens E. Wolff;Holger Flörke;Armin B. Cremers

  • Affiliations:
  • -;-;-

  • Venue:
  • ADL '00 Proceedings of the IEEE Advances in Digital Libraries 2000
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a new approach to querying collections of structured textual information such as SGML/XML documents. Knowledge about the structure of documents is an additional resource that should be exploited during retrieval since the semantics of the different textual objects can be used to specify information need much more precisely. However, the traditional probabilistic retrieval model lacks the ability to handle structural information. We define a new retrieval function based on the probabilistic model, which overcomes this drawback. The presented query language allows the assignment of structural roles to individual terms. The efficient evaluation of queries in this framework requires appropriate index structures. We design text and structure indexes and show how their information is combined during evaluation. The implementation supports additional functionalities such as a table of contents for browsing. First evaluation results show the feasibility of the approach on collections of unstructured documents.