Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Approximate counting of inversions in a data stream
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Reductions in streaming algorithms, with an application to counting triangles in graphs
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Maintaining Stream Statistics over Sliding Windows
SIAM Journal on Computing
An Approximate L1-Difference Algorithm for Massive Data Streams
SIAM Journal on Computing
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Tight Lower Bounds for the Distinct Elements Problem
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Approximate Aggregation Techniques for Sensor Databases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An improved data stream algorithm for frequency moments
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Synopsis diffusion for robust aggregation in sensor networks
SenSys '04 Proceedings of the 2nd international conference on Embedded networked sensor systems
Improved range-summable random variable construction algorithms
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Fast range-summable random variables for efficient aggregate estimation
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Sketching asynchronous streams over a sliding window
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
Pseudo-random number generation for sketch-based estimations
ACM Transactions on Database Systems (TODS)
Sketches for size of join estimation
ACM Transactions on Database Systems (TODS)
A survey on algorithms for mining frequent itemsets over data streams
Knowledge and Information Systems
Two improved range-efficient algorithms for F0 estimation
Theoretical Computer Science
Robust approximate aggregation in sensor data management systems
ACM Transactions on Database Systems (TODS)
Two improved range-efficient algorithms for F0 estimation
TAMC'07 Proceedings of the 4th international conference on Theory and applications of models of computation
Aggregate computation over data streams
APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
Hi-index | 0.00 |
Efficient one-pass computation of F驴, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F驴 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (\varepsilon ,\delta ) approximation of F驴, with the following time complexity (n is the size of the universe of the items): (1) The amortized processing time per interval is 0(\log \frac{1}{\delta }Log\frac{n}{\varepsilon }). (2) The time to answer a query for F驴 is 0(\log {1 \mathord{\left/ {\vphantom {1 {\delta )}}} \right. \kern-\nulldelimiterspace} {\delta )}}. The workspace used is 0(\frac{1}{{\varepsilon ^2 }}\log \frac{1}{\delta }\log n) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef, Kumar and Sivakumar [5], which requires 0(\frac{1}{{\varepsilon ^5 }}\log \frac{1}{\delta }\log ^5 n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan [11]. This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons [22] and Considine et. al. [8].