Multiple aggregations over data streams

Authors:
Rui Zhang;Nick Koudas;Beng Chin Ooi;Divesh Srivastava
Affiliations:
National Univ. of Singapore;Univ. of Torontó;National Univ. of Singapore;AT&T Labs-Research
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 13
Cited 23

Multiple-query optimization

ACM Transactions on Database Systems (TODS)
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Materialized view maintenance and integrity constraint checking: trading space for time

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Tribeca: a system for managing large databases of network traffic

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data stream query processing: a tutorial

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
State-slice: new paradigm of multi-query optimization of window-based stream queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Sharing aggregate computation for distributed queries

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Synopsis diffusion for robust aggregation in sensor networks

ACM Transactions on Sensor Networks (TOSN)
Prefilter: predicate pushdown at streaming speeds

SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Index tuning for parameterized streaming groupby queries

SSPS '08 Proceedings of the 2nd international workshop on Scalable stream processing system
Rule-based multi-query optimization

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Small synopses for group-by query verification on outsourced data streams

ACM Transactions on Database Systems (TODS)
Information discovery across multiple streams

Information Sciences: an International Journal
An Approximation Algorithm for Optimizing Multiple Path Tracking Queries over Sensor Data Streams

DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Scalable delivery of stream query result

Proceedings of the VLDB Endowment
High-dimensional kNN joins with incremental updates

Geoinformatica
What can hierarchies do for data streams?

BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
A top-down approach for compressing data cubes under the simultaneous evaluation of multiple hierarchical range queries

Journal of Intelligent Information Systems
Mining time-delayed associations from discrete event datasets

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Transformation of continuous aggregation join queries over data streams

SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Streaming multiple aggregations using phantoms

The VLDB Journal — The International Journal on Very Large Data Bases
Optimized processing of multiple aggregate continuous queries

Proceedings of the 20th ACM international conference on Information and knowledge management
Shared execution strategy for neighbor-based pattern mining requests over streaming windows

ACM Transactions on Database Systems (TODS)
Supporting efficient distributed top-k monitoring

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Loyalty-based selection: retrieving objects that persistently satisfy criteria

Proceedings of the 21st ACM international conference on Information and knowledge management
Multi-query optimization for semantic news feed query

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A thin monitoring layer for top-k aggregation queries over a database

Proceedings of the 7th International Workshop on Ranking in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Monitoring aggregates on IP traffic data streams is a compelling application for data stream management systems. The need for exploratory IP traffic data analysis naturally leads to posing related aggregation queries on data streams, that differ only in the choice of grouping attributes. In this paper, we address this problem of efficiently computing multiple aggregations over high speed data streams, based on a two-level LFTA/HFTA DSMS architecture, inspired by Gigascope.Our first contribution is the insight that in such a scenario, additionally computing and maintaining fine-granularity aggregation queries (phantoms) at the LFTA has the benefit of supporting shared computation. Our second contribution is an investigation into the problem of identifying beneficial LFTA configurations of phantoms and user-queries. We formulate this problem as a cost optimization problem, which consists of two sub-optimization problems: how to choose phantoms and how to allocate space for them in the LFTA. We formally show the hardness of determining the optimal configuration, and propose cost greedy heuristics for these independent sub-problems based on detailed analyses. Our final contribution is a thorough experimental study, based on real IP traffic data, as well as synthetic data, to demonstrate the effectiveness of our techniques for identifying beneficial configurations.