DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

  • Authors:
  • Hui Yang;Hongyan Liu;Jun He

  • Affiliations:
  • Information School, Renmin University of China, Beijing, 100872, China;School of Economics and Management, Tsinghua University, Beijing, 100084, China;Information School, Renmin University of China, Beijing, 100872, China

  • Venue:
  • ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent pattern mining has emerged as an important mining task in data stream mining. A number of algorithms have been proposed. These algorithms usually use a method of two steps: one is calculating the frequency of itemsets while monitoring each arrival of the data stream, and the other is to output the frequent itemsets according to user's requirement. Due to the large number of item combinations for each transaction occurred in data stream, the first step costs lots of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transactions arrived in stream, which will reduce the mining accuracy. In this paper, we propose a new approach to deal with this issue. Our new approach is a kind of lazy approach, which delays calculation of the frequency of each itemset to the second step. So, the first step only stores necessary information for each transaction, which can avoid missing any transaction arrival in data stream. In order to improve accuracy, we propose monitoring items which are most likely to be frequent. By this method, many candidate itemsets can be pruned, which leads to the good performance of the algorithm, DELAY, designed based on this method. A comprehensive experimental study shows that our algorithm achieves some improvements over existing algorithms, LossyCountingand FDPM, especially for long transaction data streams.