Near-optimal algorithms for shared filter evaluation in data stream systems

Authors:
Zhen Liu;Srinivasan Parthasarathy;Anand Ranganathan;Hao Yang
Affiliations:
IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA;IBM T. J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 22
Cited 10

Predicate migration: optimizing queries with expensive predicates

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Matching events in a content-based subscription system

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Exploiting an event-based infrastructure to develop complex distributed systems

Proceedings of the 20th international conference on Software engineering
Optimization of queries with user-defined predicates

ACM Transactions on Database Systems (TODS)
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Pipelining in multi-query optimization

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Materialized view selection and maintenance using multi-query optimization

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Design and evaluation of a wide-area event notification service

ACM Transactions on Computer Systems (TOCS)
Approximation algorithms

Approximation algorithms
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents for Selective Dissemination of Information

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
WebFilter: A High-throughput XML-based Publish and Subscribe System

Proceedings of the 27th International Conference on Very Large Data Bases
Efficient filtering of XML documents with XPath expressions

The VLDB Journal — The International Journal on Very Large Data Bases
Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An ontology-based publish/subscribe system

Proceedings of the 5th ACM/IFIP/USENIX international conference on Middleware
G-ToPSS: fast filtering of graph-based metadata

WWW '05 Proceedings of the 14th international conference on World Wide Web
Flow algorithms for two pipelined filter ordering problems

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimization of continuous queries with shared expensive filters

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
IBM multimedia search and retrieval system

Proceedings of the 6th ACM international conference on Image and video retrieval
S-ToPSS: semantic Toronto publish/subscribe system

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Stochastic covering and adaptivity

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics

A generic flow algorithm for shared filter ordering problems

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A scalable, predictable join operator for highly concurrent data warehouses

Proceedings of the VLDB Endowment
Approximation algorithms for optimal decision trees and adaptive TSP problems

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Predictable performance and high query concurrency for data analytics

The VLDB Journal — The International Journal on Very Large Data Bases
Enabling fast prediction for ensemble models on data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Predictive Data Stream Filtering

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Shared execution strategy for neighbor-based pattern mining requests over streaming windows

ACM Transactions on Database Systems (TODS)
Adaptive submodularity: theory and applications in active learning and stochastic optimization

Journal of Artificial Intelligence Research
Minimum latency submodular cover

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Evaluating continuous top-k queries over document streams

World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of evaluating multiple overlapping queries defined on data streams, where each query is a conjunction of multiple filters and each filter may be shared across multiple queries. Efficient support for overlapping queries is a critical issue in the emerging data stream systems, and this is particularly the case when filters are expensive in terms of their computational complexity and processing time. This problem generalizes other well-known problems such as pipelined filter ordering and set cover, and is not only NP-Hard but also hard to approximate within a factor of o(log n) from the optimum, where n is the number of queries. In this paper, we present two near-optimal approximation lgorithms with provably-good performance guarantees for the evaluation of overlapping queries. We present an edge-coverage based Greedy algorithm which achieves an approximation ratio of (1 + log(n) + log(α)), where n is the number of queries and α is the average number of filters in a query. We also present a randomized, fast and easily parallelizable Harmonic algorithm which achieves an approximation ratio of 2β, where β is the maximum number of filters in a query. We have implemented these algorithms in a prototype system, and evaluated their performance using extensive experiments in the context of multimedia stream analysis. The results show that our Greedy algorithm consistently outperforms other known algorithms under various settings and scales well as the numbers of queries and filters increase.