Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Pseudorandom generators for space-bounded computations
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Min-wise independent permutations (extended abstract)
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Balls and bins: a study in negative dependence
Random Structures & Algorithms
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Chernoff-Hoeffding bounds for applications with limited independence
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Min-wise independent permutations
Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Counting Distinct Elements in a Data Stream
RANDOM '02 Proceedings of the 6th International Workshop on Randomization and Approximation Techniques
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Comparing data streams using Hamming norms (how to zero in)
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Facility Location in Dynamic Geometric Data Streams
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Optimal sampling from sliding windows
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
An optimal algorithm for the distinct elements problem
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Analyzing graph structure via linear measurements
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Efficient sampling of non-strict turnstile data streams
FCT'13 Proceedings of the 19th international conference on Fundamentals of Computation Theory
Estimating duplication by content-based sampling
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 5.25 |
In data streaming applications, data arrives at rapid rates and in high volume, thus making it essential to process each stream update very efficiently in terms of both time and space. A data stream is a sequence of data records that must be processed continuously in an online fashion using sub-linear space and sub-linear processing time. We consider the problem of tracking the number of distinct items over data streams that allow insertion and deletion operations. We present two algorithms that improve on the space and time complexity of existing algorithms.