On supporting containment queries in relational database management systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Maintaining order in a linked list
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
The VLDB Journal — The International Journal on Very Large Data Bases
On boosting holism in XML twig pattern matching using structural indexing techniques
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Clio grows up: from research prototype to industrial tool
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Efficient evaluation of XQuery over streaming data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Efficient XPath Query Processor for XML Streams
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Cost-based optimization in DB2 XML
IBM Systems Journal
Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Mapping-driven XML transformation
Proceedings of the 16th international conference on World Wide Web
Holistic twig joins on indexed XML documents
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The BEA/XQRL streaming XQuery processor
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The GCX system: dynamic buffer minimization in streaming XQuery evaluation
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient evaluation of generalized tree-pattern queries on XML streams
The VLDB Journal — The International Journal on Very Large Data Bases
A survey on XML streaming evaluation techniques
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
We study the problem of extracting flattened tuple data from streaming, hierarchical XML data. Tuple-extraction queries are essentially XML pattern queries with multiple extraction nodes. Their typical applications include mapping-based XML transformation and integrated (set-based) processing of XML and relational data. Holistic twig joins are known for the optimal matching of XML pattern queries on parsed/indexed XML data. Naïve application of the holistic twig joins to streaming XML data incurs unnecessary disk I/Os. We adapt the holistic twig joins for tuple-extraction queries on streaming XML with two novel features: first, we use the block-and-trigger technique to consume streaming XML data in a best-effort fashion without compromising the optimality of holistic matching; second, to reduce peak buffer sizes and overall running times, we apply query-path pruning and existential-match pruning techniques to aggressively filter irrelevant incoming data. We compare our solution with the direct competitor TurboXPath and other alternative approaches that use full-fledged query engines such as XQuery or XSLT engines for tuple extraction. The experiments using real-world XML data and queries demonstrated that our approach 1) outperformed its competitors by up to orders of magnitude, and 2) exhibited almost linear scalability. Our solution has been demonstrated extensively to IBM customers and will be included in customer engagement applications in healthcare.