Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach

Authors:
Christoph Koch
Affiliations:
Laboratory for Foundations of Computer Science, University of Edinburgh, Edinburgh, UK
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 13
Cited 33

LTUR: a simplified linear-time unit resolution algorithm for Horn formulae and computer implementation

Information Processing Letters
Logic programming and databases

Logic programming and databases
Automata on infinite objects

Handbook of theoretical computer science (vol. B)
Regular path queries with constraints

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Expressiveness of structured document query languages based on attribute grammars

Journal of the ACM (JACM)
Monadic datalog and the expressive power of languages for web information extraction

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
Query automata over finite trees

Theoretical Computer Science
Automata theory for XML researchers

ACM SIGMOD Record
Processing XML Streams with Deterministic Automata

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Query Evaluation on Compressed Trees (Extended Abstract)

LICS '03 Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science
Efficient algorithms for processing XPath queries

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Twig query processing over graph-structured XML data

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Logic-based web information extraction

ACM SIGMOD Record
The Lixto data extraction project: back and forth between theory and practice

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Conditional XPath, the first order complete XPath dialect

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Answering order-based queries over XML data

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Lower bounds for sorting with few random accesses to external memory

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cost-sensitive reordering of navigational primitives

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Processing queries on tree-structured data efficiently

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using partial evaluation in distributed query evaluation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
SMOQE: a system for providing secure access to XML

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Automata for XML---A survey

Journal of Computer and System Sciences
Tight lower bounds for query processing on streaming and external memory data

Theoretical Computer Science
Querying xml with update syntax

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Distributed query evaluation with performance guarantees

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Attribute grammars for scalable query processing on XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
Machine models and lower bounds for query processing

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic Join for Generalized Tree Patterns

Information Systems
XPath leashed

ACM Computing Surveys (CSUR)
Updating recursive XML views of relations

Journal of Computer Science and Technology
DOM tree browsing of a very large XML document: Design and implementation

Journal of Systems and Software
Binary XML storage and query processing in Oracle 11g

Proceedings of the VLDB Endowment
XPath whole query optimization

Proceedings of the VLDB Endowment
View update translation for XML

Proceedings of the 14th International Conference on Database Theory
Queries on Xml streams with bounded delay and concurrency

Information and Computation
Algebraic incremental maintenance of XML views

Proceedings of the 14th International Conference on Extending Database Technology
LMIX: a dynamic XML index method using line model

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Tight lower bounds for query processing on streaming and external memory data

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
A logic-based approach to cache answerability for XPath queries

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies
Partial Evaluation for Distributed XPath Query Processing and Beyond

ACM Transactions on Database Systems (TODS)
Efficient fragmentation of large XML documents

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Optimizing XML querying using type-based document projection

ACM Transactions on Database Systems (TODS)
XML compression via DAGs

Proceedings of the 16th International Conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new, highly scalable and efficient technique for evaluating node-selecting queries on XML trees which is based on recent advances in the theory of tree automata. Our query processing techniques require only two linear passes over the XML data on disk, and their main memory requirements are in principle independent of the size of the data. The overall running time is O(m + n), where monly depends on the query and n is the size of the data. The query language supported is very expressive and captures exactly all node-selecting queries answerable with only a bounded amount of memory (thus, all queries that can be answered by any form of finite-state system on XML trees). Visiting each tree node only twice is optimal, and current automata-based approaches to answering path queries on XML streams, which work using one linear scan of the stream, are considerably less expressive. These technical results - which give rise to expressive query engines that deal more efficiently with large amounts of data in secondary storage - are complemented with an experimental evaluation of our work.