Range-Efficient Computation of F" over Massive Data Streams

Authors:
A. Pavan;Srikanta Tirthapura
Affiliations:
Iowa State University;Iowa State University
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 17
Cited 9

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Estimating simple functions on the union of data streams

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Approximate counting of inversions in a data stream

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining Stream Statistics over Sliding Windows

SIAM Journal on Computing
An Approximate L1-Difference Algorithm for Massive Data Streams

SIAM Journal on Computing
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream

RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Tight Lower Bounds for the Distinct Elements Problem

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Approximate Aggregation Techniques for Sensor Databases

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An improved data stream algorithm for frequency moments

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Synopsis diffusion for robust aggregation in sensor networks

SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Improved range-summable random variable construction algorithms

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Comparing data streams using Hamming norms (how to zero in)

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Fast range-summable random variables for efficient aggregate estimation

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sketching asynchronous streams over a sliding window

Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Pseudo-random number generation for sketch-based estimations

ACM Transactions on Database Systems (TODS)
Sketches for size of join estimation

ACM Transactions on Database Systems (TODS)
A survey on algorithms for mining frequent itemsets over data streams

Knowledge and Information Systems
Two improved range-efficient algorithms for F0 estimation

Theoretical Computer Science
Robust approximate aggregation in sensor data management systems

ACM Transactions on Database Systems (TODS)
Two improved range-efficient algorithms for F0 estimation

TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
Aggregate computation over data streams

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficient one-pass computation of F驴, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F驴 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (\varepsilon ,\delta ) approximation of F驴, with the following time complexity (n is the size of the universe of the items): (1) The amortized processing time per interval is 0(\log \frac{1}{\delta }Log\frac{n}{\varepsilon }). (2) The time to answer a query for F驴 is 0(\log {1 \mathord{\left/ {\vphantom {1 {\delta )}}} \right. \kern-\nulldelimiterspace} {\delta )}}. The workspace used is 0(\frac{1}{{\varepsilon ^2 }}\log \frac{1}{\delta }\log n) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef, Kumar and Sivakumar [5], which requires 0(\frac{1}{{\varepsilon ^5 }}\log \frac{1}{\delta }\log ^5 n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan [11]. This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons [22] and Considine et. al. [8].