Structured information retrieval in XML documents

  • Authors:
  • Evangelos Kotsakis

  • Affiliations:
  • Joint Research Center (CCR), TP261, I-21020 Ispra (VA), Italy

  • Venue:
  • Proceedings of the 2002 ACM symposium on Applied computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Query languages that take advantage of the XML document structure already exist. However, the systems that have been developed to query XML data explore the XML sources from a database perspective. This paper examines an XML collection from the viewpoint of Information Retrieval (IR). As such, we view the XML documents as a collection of text documents with additional tags and we attempt to adapt existing IR techniques to achieve more sophisticated search on XML documents. We employ a class of queries that support path expressions and suggest an efficient index, which extends the inverted file structure to search XML documents. This is accomplished by integrating the XML structure in the inverted file by combining the inverted file with a path index. The proposed structure is a lexicographical index, which may be used for the evaluation of queries that involve path expressions. Moreover, this paper discusses a ranking scheme based on both the term distribution and document structure. Some performance remarks are also presented.