Memory Efficient Algorithm for Mining Recent Frequent Items in a Stream

Authors:
Piotr Kołaczkowski
Affiliations:
Warsaw University of Technology, Institute of Computer Science,
Venue:
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Year:
2007

Citing 10
Cited 1

New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On Hardware for Computing Exponential and Trigonometric Functions

IEEE Transactions on Computers
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

ACM Transactions on Computer Systems (TOCS)
Finding Repeated Elements

Finding Repeated Elements
estWin: Online data stream mining of recent frequent itemsets by sliding window method

Journal of Information Science
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
A measurement study of correlations of internet flow characteristics

Computer Networks: The International Journal of Computer and Telecommunications Networking
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Modeling conservative updates in multi-hash approximate count sketches

Proceedings of the 24th International Teletraffic Congress

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper we present an improved version of multistage hashing based algorithm, used to find frequent items in a stream. Our algorithm uses low-pass filters instead of simple counters, so it concentrates more on recent items and ignores the old ones. Such behaviour is similar to sliding window based algorithms, but requires less memory and is suitable for real-time applications. The algorithm continuously gives estimates of frequencies of the most frequent items. It was tested with streams having various frequency distributions and proved to work correctly.