Approximate counting: a detailed analysis
BIT - Ellis Horwood series in artificial intelligence
New sampling-based summary statistics for improving approximate query answers
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
Journal of Computer and System Sciences
Counting large numbers of events in small registers
Communications of the ACM
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
Finding frequent items in data streams
Theoretical Computer Science - Special issue on automata, languages and programming
Approximate counts and quantiles over sliding windows
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Methods for finding frequent items in data streams
The VLDB Journal — The International Journal on Very Large Data Bases
Methods for mining frequent items in data streams: an overview
Knowledge and Information Systems
Hi-index | 0.00 |
Finding frequent items in a data stream is a fundamental problem; Given a threshold θ∈(0,1), find items appearing more than $\theta \cdotp N$ times in an input stream with length N. Karp, Shenker, Papadimiriou (2003) gave a simple deterministic online algorithm, which allows false positive outputs using memory of O(θ−1 logN) bits, while they also gave a lower bound. Motivated by the theoretical bound of the space complexity, this paper proposes a simple randomized online algorithm using memory of O(θ−2 log2θ−1+loglogN) bits where parameters for approximation are hidden in the constant. Our algorithm is robust for memory overflow, compared with other naïve randomized algorithms, or deterministic algorithms using memory of O(logN) bits. We also give some randomized algorithms for approximate counting.