Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
Index structures for selective dissemination of information under the Boolean model
ACM Transactions on Database Systems (TODS)
Approximation algorithms for NP-hard problems
Approximation algorithms for NP-hard problems
The SIFT information dissemination system
ACM Transactions on Database Systems (TODS)
Mesh-based content routing using XML
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Processing XML Streams with Deterministic Automata
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Content-Based Networking: A New Communication Infrastructure
IMWS '01 Revised Papers from the NSF Workshop on Developing an Infrastructure for Mobile and Wireless Systems
Stream processing of XPath queries with predicates
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Tight Approximation Results for General Covering Integer Programs
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
YFilter: Efficient and Scalable Filtering of XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
From searching text to querying XML streams
Journal of Discrete Algorithms - SPIRE 2002
Processing XML streams with deterministic automata and stream indexes
ACM Transactions on Database Systems (TODS)
Bloom Filter-Based XML Packets Filtering for Millions of Path Queries
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient xml data dissemination with piggybacking
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Efficient mining of frequent XML query patterns with repeating-siblings
Information and Software Technology
Hi-index | 0.00 |
We consider the view selection problem for XML content based routing: given a network, in which a stream of XML documents is routed and the routing decisions are taken based on results of evaluating XPath predicates on these documents, select a set of views that maximize the throughput of the network. While in view selection for relational queries the speedup comes from eliminating joins, here the speedup is obtained from gaining direct access to data values in an XML packet, without parsing that packet. The views in our context can be seen as a binary representation of the XML document, tailored for the network's workload.In this paper we define formally the view selection problem in the context of XML content based routing, and provide a practical solution for it. First, we formalize the problem; while the exact formulation is too complex to admit practical solutions, we show that it can be simplified to a manageable optimization problem, without loss in precision. Second we show that the simplified problem can be reduced to the Integer Cover problem. The Integer Cover problem is known to be NP-hard, and to admit a log n greedy approximation algorithm. Third, we show that the same greedy approximation algorithm performs much better on a class of work-loads called 'hierarchical workloads', which are typical in XML stream processing. Namely, it returns an optimal solution for hierarchical workloads, and degrades gracefully to the log n general bound as the workload becomes less hierarchical.