Streaming multiple aggregations using phantoms

Authors:
Rui Zhang;Nick Koudas;Beng Chin Ooi;Divesh Srivastava;Pu Zhou
Affiliations:
University of Melbourne, Parkville, Australia;University of Toronto, Toronto, Canada;National University of Singapore, Singapore, Singapore;AT&T Labs---Research, Middletown, USA;University of Melbourne, Parkville, Australia
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2010

Citing 26
Cited 2

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Materialized view maintenance and integrity constraint checking: trading space for time

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
View indexing in relational databases

ACM Transactions on Database Systems (TODS)
Decomposition—a strategy for query processing

ACM Transactions on Database Systems (TODS)
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A robust, optimization-based approach for approximate answering of aggregate queries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Common expression analysis in database applications

SIGMOD '82 Proceedings of the 1982 ACM SIGMOD international conference on Management of data
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries

Proceedings of the 27th International Conference on Very Large Data Bases
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Path sharing and predicate evaluation for high-performance XML filtering

ACM Transactions on Database Systems (TODS)
Nile: A Query Processing Engine for Data Streams

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Multiple aggregations over data streams

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
On-the-fly sharing for streamed aggregation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Tribeca: a system for managing large databases of network traffic

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data stream query processing: a tutorial

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Resource sharing in continuous sliding-window aggregates

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Rule-based multi-query optimization

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Optimization of single expressions in a relational data base system

IBM Journal of Research and Development
Towards expressive publish/subscribe systems

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Regression on evolving multi-relational data streams

Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Optimized processing of multiple aggregate continuous queries

Proceedings of the 20th ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one--the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.