A randomized algorithm for finding frequent elements in streams using o(loglogn) space

Authors:
Masatora Ogata;Yukiko Yamauchi;Shuji Kijima;Masafumi Yamashita
Affiliations:
Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan;Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan
Venue:
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Year:
2011

Citing 13
Cited 0

Approximate counting: a detailed analysis

BIT - Ellis Horwood series in artificial intelligence
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments

Journal of Computer and System Sciences
Counting large numbers of events in small registers

Communications of the ACM
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
Finding frequent items in data streams

Theoretical Computer Science - Special issue on automata, languages and programming
Approximate counts and quantiles over sliding windows

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Methods for finding frequent items in data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Methods for mining frequent items in data streams: an overview

Knowledge and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding frequent items in a data stream is a fundamental problem; Given a threshold θ∈(0,1), find items appearing more than $\theta \cdotp N$ times in an input stream with length N. Karp, Shenker, Papadimiriou (2003) gave a simple deterministic online algorithm, which allows false positive outputs using memory of O(θ−1 logN) bits, while they also gave a lower bound. Motivated by the theoretical bound of the space complexity, this paper proposes a simple randomized online algorithm using memory of O(θ−2 log2θ−1+loglogN) bits where parameters for approximation are hidden in the constant. Our algorithm is robust for memory overflow, compared with other naïve randomized algorithms, or deterministic algorithms using memory of O(logN) bits. We also give some randomized algorithms for approximate counting.