Efficient Approximation of Correlated Sums on Data Streams

Authors:
Rohit Ananthakrishna;Abhinandan Das;Johannes Gehrke;Flip Korn;S. Muthukrishnan;Divesh Srivastava
Affiliations:
-;-;IEEE Computer Society;IEEE Computer Society;-;IEEE Computer Society
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2003

Citing 9
Cited 3

The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Synopsis data structures for massive data sets

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Querying Multiple Features of Groups in Relational Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An Approximate L1-Difference Algorithm for Massive Data Streams

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Tribeca: a system for managing large databases of network traffic

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference

Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Maintaining Sliding Window Skylines on Data Streams

IEEE Transactions on Knowledge and Data Engineering
Declaring independence via the sketching of sketches

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many applications such as IP network management, data arrives in streams and queries over those streams need to be processed online using limited storage. Correlated-sum (CS) aggregates are a natural class of queries formed by composing basic aggregates on (x,y) pairs and are of the form SUM{g(y) : x \leq f(AGG(x))}, where AGG(x) can be any basic aggregate and f(), g() are user-specified functions. CS-aggregates cannot be computed exactly in one pass through a data stream using limited storage; hence, we study the problem of computing approximate CS-aggregates. We guarantee a priori error bounds when AGG(x) can be computed in limited space (e.g., MIN, MAX, AVG), using two variants of Greenwald and Khanna's summary structure for the approximate computation of quantiles. Using real data sets, we experimentally demonstrate that an adaptation of the quantile summary structure uses much less space, and is significantly faster, than a more direct use of the quantile summary structure, for the same a posteriori error bounds. Finally, we prove that, when AGG(x) is a quantile (which cannot be computed over a data stream in limited space), the error of a CS-aggregate can be arbitrarily large.