Mining top-k frequent items in a data stream with flexible sliding windows

Authors:
Hoang Thanh Lam;Toon Calders
Affiliations:
TU Eindhoven, Eindhoven, Netherlands;TU Eindhoven, Eindhoven, Netherlands
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 7
Cited 8

Birthday paradox, coupon collectors, caching algorithms and self-organizing search

Discrete Applied Mathematics
Randomized algorithms

ACM Computing Surveys (CSUR)
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Some New Aspects of the Coupon Collector's Problem

SIAM Journal on Discrete Mathematics
A simpler and more efficient deterministic scheme for finding frequent items over sliding windows

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining frequent items in a stream using flexible windows

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Mining Frequent Itemsets in a Stream

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Towards a variable size sliding window model for frequent itemset mining over data streams

Computers and Industrial Engineering
Mining frequent patterns in a varying-size sliding window of online transactional data streams

Information Sciences: an International Journal
A fast algorithm for frequent itemset mining using Patricia* structures

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Identifying streaming frequent items in ad hoc time windows

Data & Knowledge Engineering
Mining Top-K Rank Frequent Patterns in Data Streams A Tree Based Approach with Ternary Function and Ternary Feature Vector

Proceedings of the Second International Conference on Innovative Computing and Cloud Computing
Efficient frequent pattern mining based on Linear Prefix tree

Knowledge-Based Systems
Mining frequent items in data stream using time fading model

Information Sciences: an International Journal
Mining top-k frequent patterns over data streams sliding window

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We study the problem of finding the k most frequent items in a stream of items for the recently proposed max-frequency measure. Based on the properties of an item, the max-frequency of an item is counted over a sliding window of which the length changes dynamically. Besides being parameterless, this way of measuring the support of items was shown to have the advantage of a faster detection of bursts in a stream, especially if the set of items is heterogeneous. The algorithm that was proposed for maintaining all frequent items, however, scales poorly when the number of items becomes large. Therefore, in this paper we propose, instead of reporting all frequent items, to only mine the top-k most frequent ones. First we prove that in order to solve this problem exactly, we still need a prohibitive amount of memory (at least linear in the number of items). Yet, under some reasonable conditions, we show both theoretically and empirically that a memory-efficient algorithm exists. A prototype of this algorithm is implemented and we present its performance w.r.t. memory-efficiency on real-life data and in controlled experiments with synthetic data.