Index Design for Structured Documents Based on Abstraction

Authors:
Jyh-Herng Chow;Josephine M. Cheng;Daniel T. Chang;Jane Xu
Affiliations:
-;-;-;-
Venue:
DASFAA '99 Proceedings of the Sixth International Conference on Database Systems for Advanced Applications
Year:
1999

Citing 12
Cited 1

Denotational semantics: a methodology for language development

Denotational semantics: a methodology for language development
Information retrieval: data structures and algorithms

Information retrieval: data structures and algorithms
Automatic generation and management of interprocedural program analyses

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimizing queries on files

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
A query language and optimization techniques for unstructured data

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The concise companion

The concise companion
How to do field searching in Web search engines: a field trip

Online
Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

POPL '77 Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Abstract Interpretation of Declarative Languages

Abstract Interpretation of Declarative Languages
Indexing Techniques for Queries on Nested Objects

IEEE Transactions on Knowledge and Data Engineering
Optimizing Regular Path Expressions Using Graph Schemas

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases

High Level Indexing of User-Defined Types

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

HTML has been the standard format for delivering information on the web. However, automated information processing on these documents for data exchange and interoperability has been difficult. XML, a subset of SGML, has been proposed to be the next standard format that allows user-defined tags for better describing nested document structures and associated semantics. Operations on structured documents, such as searching in nested document structures, require new functions not currently available on most systems today. We describe a general framework for manipulating structured documents based on document abstractions. An abstraction is an approximation of an actual document, while possessing useful properties for analyses of interest. The framework provides a wide design space for tradeoff between cost and capability. This general framework can be applied to index design, document searching, and categorizations.We present this framework by focusing on indexing and searching of structured documents in the XML domain, and prove their soundness. We also address the issues of rich data types in XML documents.