Maintaining frequent itemsets over high-speed data streams

  • Authors:
  • James Cheng;Yiping Ke;Wilfred Ng

  • Affiliations:
  • Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China;Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China;Department of Computer Science, Hong Kong University of Science and Technology, Kowloon, Hong Kong, China

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a false-negative approach to approximate the set of frequent itemsets (FIs) over a sliding window. Existing approximate algorithms use an error parameter, ε, to control the accuracy of the mining result. However, the use of ε leads to a dilemma. A smaller ε gives a more accurate mining result but higher computational complexity, while increasing ε degrades the mining accuracy. We address this dilemma by introducing a progressively increasing minimum support function. When an itemset is retained in the window longer, we require its minimum support to approach the minimum support of an FI. Thus, the number of potential FIs to be maintained is greatly reduced. Our experiments show that our algorithm not only attains highly accurate mining results, but also runs significantly faster and consumes less memory than do existing algorithms for mining FIs over a sliding window.