Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach

  • Authors:
  • Christoph Koch

  • Affiliations:
  • Laboratory for Foundations of Computer Science, University of Edinburgh, Edinburgh, UK

  • Venue:
  • VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new, highly scalable and efficient technique for evaluating node-selecting queries on XML trees which is based on recent advances in the theory of tree automata. Our query processing techniques require only two linear passes over the XML data on disk, and their main memory requirements are in principle independent of the size of the data. The overall running time is O(m + n), where monly depends on the query and n is the size of the data. The query language supported is very expressive and captures exactly all node-selecting queries answerable with only a bounded amount of memory (thus, all queries that can be answered by any form of finite-state system on XML trees). Visiting each tree node only twice is optimal, and current automata-based approaches to answering path queries on XML streams, which work using one linear scan of the stream, are considerably less expressive. These technical results - which give rise to expressive query engines that deal more efficiently with large amounts of data in secondary storage - are complemented with an experimental evaluation of our work.