Tree pattern aggregation for scalable XML data dissemination

Authors:
Chee-Yong Chan;Wenfei Fan;Pascal Felber;Minos Garofalakis;Rajeev Rastogi
Affiliations:
Bell Labs, Lucent Technologies;Bell Labs, Lucent Technologies;Bell Labs, Lucent Technologies;Bell Labs, Lucent Technologies;Bell Labs, Lucent Technologies
Venue:
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Year:
2002

Citing 8
Cited 19

Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Exploiting IP multicast in content-based publish-subscribe systems

IFIP/ACM International Conference on Distributed systems platforms
Minimization of tree pattern queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Query Merging: Improving Query Subscription Processing in a Multicast Environment

IEEE Transactions on Knowledge and Data Engineering
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of XML Path Expressions for Internet Scale Applications

Proceedings of the 27th International Conference on Very Large Data Bases
Content-Based Networking: A New Communication Infrastructure

IMWS '01 Revised Papers from the NSF Workshop on Developing an Infrastructure for Mobile and Wireless Systems
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Scalable Filtering of XML Data for Web Services

IEEE Internet Computing
The many faces of publish/subscribe

ACM Computing Surveys (CSUR)
On the use of hierarchical information in sequential mining-based XML document similarity computation

Knowledge and Information Systems
Efficient algorithms for processing XPath queries

ACM Transactions on Database Systems (TODS)
An efficient subscription routing algorithm for scalable XML-based publish/subscribe systems

Journal of Systems and Software
Efficient xml data dissemination with piggybacking

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Reasoning about XML update constraints

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Chained forests for fast subsumption matching

Proceedings of the 2007 inaugural international conference on Distributed event-based systems
Parameterized pattern queries

Data & Knowledge Engineering
Path queries on compressed XML

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Estimating the output cardinality of partial preaggregation with a measure of clusteredness

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Towards an internet-scale XML dissemination service

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Stream firewalling of xml constraints

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Dynamic content-based channels: meeting in the middle

Proceedings of the second international conference on Distributed event-based systems
Fast track article: Dynamic filter merging and mergeability detection for publish/subscribe

Pervasive and Mobile Computing
Reasoning about XML update constraints

Journal of Computer and System Sciences
Efficient algorithms for descendant-only tree pattern queries

Information Systems
Efficient algorithms for the tree homeomorphism problem

DBPL'07 Proceedings of the 11th international conference on Database programming languages
Semantic peer-to-peer overlays for publish/subscribe networks

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid growth of XML-document traffic on the Internet, scalable content-based dissemination of XML documents to a large, dynamic group of consumers has become an important research challenge. To indicate the type of content that they are interested in, data consumers typically specify their subscriptions using some XML pattern specification language (e.g., XPath). Given the large volume of subscribers, system scalability and efficiency mandate the ability to aggregate the set of consumer subscriptions to a smaller set of content specifications, so as to both reduce their storage-space requirements as well as speed up the document-subscription matching process. In this paper, we provide the first systematic study of subscription aggregation where subscriptions are specified with tree patterns (an important subclass of XPath expressions). The main challenge is to aggregate an input set of tree patterns into a smaller set of generalized tree patterns such that: (1) a given space constraint on the total size of the subscriptions is met, and (2) the loss in precision (due to aggregation) during document filtering is minimized. We propose an efficient tree-pattern aggregation algorithm that makes effective use of document-distribution statistics in order to compute a precise set of aggregate tree patterns within the allotted space budget. As part of our solution, we also develop several novel algorithms for tree-pattern containment and minimization, as well as "least-upper-bound" computation for a set of tree patterns. These results are of interest in their own right, and can prove useful in other domains, such as XML query optimization. Extensive results from a prototype implementation validate our approach.