StreamTX: extracting tuples from streaming XML data

Authors:
Wook-Shin Han;Haifeng Jiang;Howard Ho;Quanzhong Li
Affiliations:
Kyungpook National University, Republic of Korea;Google Inc., Mountain View, California;IBM Almaden Research Center, San Jose, California;IBM Almaden Research Center, San Jose, California
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 17
Cited 2

On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Maintaining order in a linked list

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Querying XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
On boosting holism in XML twig pattern matching using structural indexing techniques

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Clio grows up: from research prototype to industrial tool

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
XSQ: A streaming XPath engine

ACM Transactions on Database Systems (TODS)
Efficient evaluation of XQuery over streaming data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
An Efficient XPath Query Processor for XML Streams

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Cost-based optimization in DB2 XML

IBM Systems Journal
Twig2Stack: bottom-up processing of generalized-tree-pattern queries over XML documents

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Mapping-driven XML transformation

Proceedings of the 16th international conference on World Wide Web
Holistic twig joins on indexed XML documents

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
The BEA/XQRL streaming XQuery processor

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Schema-based scheduling of event processors and buffer minimization for queries on structured data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The GCX system: dynamic buffer minimization in streaming XQuery evaluation

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

Efficient evaluation of generalized tree-pattern queries on XML streams

The VLDB Journal — The International Journal on Very Large Data Bases
A survey on XML streaming evaluation techniques

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of extracting flattened tuple data from streaming, hierarchical XML data. Tuple-extraction queries are essentially XML pattern queries with multiple extraction nodes. Their typical applications include mapping-based XML transformation and integrated (set-based) processing of XML and relational data. Holistic twig joins are known for the optimal matching of XML pattern queries on parsed/indexed XML data. Naïve application of the holistic twig joins to streaming XML data incurs unnecessary disk I/Os. We adapt the holistic twig joins for tuple-extraction queries on streaming XML with two novel features: first, we use the block-and-trigger technique to consume streaming XML data in a best-effort fashion without compromising the optimality of holistic matching; second, to reduce peak buffer sizes and overall running times, we apply query-path pruning and existential-match pruning techniques to aggressively filter irrelevant incoming data. We compare our solution with the direct competitor TurboXPath and other alternative approaches that use full-fledged query engines such as XQuery or XSLT engines for tuple extraction. The experiments using real-world XML data and queries demonstrated that our approach 1) outperformed its competitors by up to orders of magnitude, and 2) exhibited almost linear scalability. Our solution has been demonstrated extensively to IBM customers and will be included in customer engagement applications in healthcare.