Hancock: a language for extracting signatures from data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
An XML query engine for network-bound data
The VLDB Journal — The International Journal on Very Large Data Bases
Issues in data stream management
ACM SIGMOD Record
Stream processing of XPath queries with predicates
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
XPath queries on streaming data
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Path sharing and predicate evaluation for high-performance XML filtering
ACM Transactions on Database Systems (TODS)
Texquery: a full-text search extension to xquery
Proceedings of the 13th international conference on World Wide Web
Processing XML streams with deterministic automata and stream indexes
ACM Transactions on Database Systems (TODS)
On the memory requirements of XPath evaluation over XML streams
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient evaluation of XQuery over streaming data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic query optimization for XQuery over XML streams
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic querying of tree-structured data sources using partially specified tree patterns
Proceedings of the 14th ACM international conference on Information and knowledge management
An Efficient XPath Query Processor for XML Streams
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
AFilter: adaptable XML filtering with prefix-caching suffix-clustering
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Forward node-selecting queries over trees
ACM Transactions on Database Systems (TODS)
On the memory requirements of XPath evaluation over XML streams
Journal of Computer and System Sciences
Efficient algorithms for evaluating xpath over streams
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SPEX: Streamed and Progressive Evaluation of XPath
IEEE Transactions on Knowledge and Data Engineering
A transducer-based XML query processor
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
XSEarch: a semantic search engine for XML
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The BEA/XQRL streaming XQuery processor
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Data stream query processing: a tutorial
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Evaluation of partial path queries on xml data
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Assigning semantics to partial tree-pattern queries
Data & Knowledge Engineering
Early profile pruning on XML-aware publish-subscribe systems
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Querying complex structured databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient evaluation of generalized path pattern queries on XML data
Proceedings of the 17th international conference on World Wide Web
StreamTX: extracting tuples from streaming XML data
Proceedings of the VLDB Endowment
Eager Evaluation of Partial Tree-Pattern Queries on XML Streams
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Containment of partially specified tree-pattern queries in the presence of dimension graphs
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient evaluation of partial path queries over a XML compact storage structure
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
A survey on XML streaming evaluation techniques
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.01 |
The streaming evaluation is a popular way of evaluating queries on XML documents. Besides its many advantages, it is also the only option for a number of important XML applications. Unfortunately, existing algorithms focus almost exclusively on tree-pattern queries (TPQs). Requirements for flexible querying of XML data have motivated recently the introduction of query languages that are more general and flexible than TPQs. These languages are not supported by existing algorithms. In this paper, we consider a partial tree-pattern query (PTPQ) language which generalizes and strictly contains TPQs. PTPQs can express a fragment of XPath which comprises reverse axes and the node identity equality (is) operator, in addition to forward axes, wildcards and predicates. They constitute an important subclass of XPath, which is very useful in practice. Unfortunately, previous streaming algorithms for TPQs cannot be applied to PTPQs. PTPQs can be represented as dags enhanced with constraints. We explore this representation to design an original polynomial time streaming algorithm for PTPQs. Our algorithm aggressively filters incoming data that is irrelevant to the query and wisely avoids processing redundant query matches (i.e., matches of the query dag that do not contribute to new solutions). Our algorithm is the first one to support the streaming evaluation of such a broad fragment of XPath. We provide an analysis of it, and conduct an extensive experimental evaluation of its performance and scalability. Compared to the only known streaming algorithm that supports TPQs extended with reverse axes, our algorithm performs better by orders of magnitude while consuming a much smaller fraction of memory space. Current streaming applications have stringent requirements on query response time and memory consumption because of the large (possibly unbounded) size of data they handle. In order to keep memory usage and CPU consumption low for the PTPQ streaming evaluation, we design another streaming algorithm called Eager PSX for PTPQs. Its key feature is that it applies an eager evaluation strategy to quickly determine when node matches should be returned as solutions to the user and also to proactively detect redundant matches. We theoretically analyze Eager PSX, and experimentally test its time and space performance and scalability. We compare it with PSX. Our results show that Eager PSX not only achieves better space performance without compromising time performance, but it also greatly improves query response time for both simple and complex queries, in many cases, by orders of magnitude.