Index Design for Structured Documents Based on Abstraction

  • Authors:
  • Jyh-Herng Chow;Josephine M. Cheng;Daniel T. Chang;Jane Xu

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

HTML has been the standard format for delivering information on the web. However, automated information processing on these documents for data exchange and interoperability has been difficult. XML, a subset of SGML, has been proposed to be the next standard format that allows user-defined tags for better describing nested document structures and associated semantics. Operations on structured documents, such as searching in nested document structures, require new functions not currently available on most systems today. We describe a general framework for manipulating structured documents based on document abstractions. An abstraction is an approximation of an actual document, while possessing useful properties for analyses of interest. The framework provides a wide design space for tradeoff between cost and capability. This general framework can be applied to index design, document searching, and categorizations.We present this framework by focusing on indexing and searching of structured documents in the XML domain, and prove their soundness. We also address the issues of rich data types in XML documents.