An Indexing Model for Structured Documents to Support Queries on Content, Structure and Attributes

  • Authors:
  • Tuong Dao

  • Affiliations:
  • -

  • Venue:
  • ADL '98 Proceedings of the Advances in Digital Libraries Conference
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The complex internal structure of documents can be described and captured by documentation representation standards such as SGML and SGML related standards like HTML and XML. The hierarchical structure of documents and the attributes of documents as well as attributes of document components at all levels of the document hierarchy can be encoded with markup tags. In traditional text database systems, only queries on content are supported. The rich structural information contained in documents and the attributes of document components are not captured in these systems, and queries on structure and attributes are not supported.We propose a text model, a query language and an indexing scheme which can support queries on content, structure, and attributes of documents as well as attributes of text elements within documents. This model is schema-independent, and query evaluation time is at worst linear. We show that our indexing scheme can efficiently support a wide range of queries in a database for highly heterogeneous collections of structured documents. We provide query examples to show how all the information encoded in documents marked up according to the TEI Guidelines, an encoding standard adopted by the humanities disciplines, can be indexed and queried in our indexing model.