The view selection problem for XML content based routing

Authors:
Ashish Kumar Gupta;Dan Suciu;Alon Y. Halevy
Affiliations:
University of Washington;University of Washington;University of Washington
Venue:
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2003

Citing 12
Cited 5

Personalized information delivery: an analysis of information filtering methods

Communications of the ACM - Special issue on information filtering
Index structures for selective dissemination of information under the Boolean model

ACM Transactions on Database Systems (TODS)
Approximation algorithms for NP-hard problems

Approximation algorithms for NP-hard problems
The SIFT information dissemination system

ACM Transactions on Database Systems (TODS)
Mesh-based content routing using XML

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Processing XML Streams with Deterministic Automata

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Content-Based Networking: A New Communication Infrastructure

IMWS '01 Revised Papers from the NSF Workshop on Developing an Infrastructure for Mobile and Wireless Systems
Stream processing of XPath queries with predicates

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Tight Approximation Results for General Covering Integer Programs

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
YFilter: Efficient and Scalable Filtering of XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

From searching text to querying XML streams

Journal of Discrete Algorithms - SPIRE 2002
Processing XML streams with deterministic automata and stream indexes

ACM Transactions on Database Systems (TODS)
Bloom Filter-Based XML Packets Filtering for Millions of Path Queries

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient xml data dissemination with piggybacking

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient mining of frequent XML query patterns with repeating-siblings

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the view selection problem for XML content based routing: given a network, in which a stream of XML documents is routed and the routing decisions are taken based on results of evaluating XPath predicates on these documents, select a set of views that maximize the throughput of the network. While in view selection for relational queries the speedup comes from eliminating joins, here the speedup is obtained from gaining direct access to data values in an XML packet, without parsing that packet. The views in our context can be seen as a binary representation of the XML document, tailored for the network's workload.In this paper we define formally the view selection problem in the context of XML content based routing, and provide a practical solution for it. First, we formalize the problem; while the exact formulation is too complex to admit practical solutions, we show that it can be simplified to a manageable optimization problem, without loss in precision. Second we show that the simplified problem can be reduced to the Integer Cover problem. The Integer Cover problem is known to be NP-hard, and to admit a log n greedy approximation algorithm. Third, we show that the same greedy approximation algorithm performs much better on a class of work-loads called 'hierarchical workloads', which are typical in XML stream processing. Namely, it returns an optimal solution for hierarchical workloads, and degrades gracefully to the log n general bound as the workload becomes less hierarchical.