Sliding window filtering: an efficient method for incremental mining on a time-variant database.

  • Authors:
  • Chang-Hung Lee;Cheng-Ru Lin;Ming-Syan Chen

  • Affiliations:
  • Department of Electrical Enaineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei, Taiwan, ROC.;Department of Electrical Enaineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei, Taiwan, ROC.;Department of Electrical Enaineering, National Taiwan University, No. 1, Sec. 4, Roosevelt Rd., Taipei, Taiwan, ROC.

  • Venue:
  • Information Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, several important database applications have called for the design of efficient techniques for incremental mining of association rules. In response to this need, we explore in this paper an effective sliding-window filtering (abbreviatedly as SWF) algorithm for incremental mining of association rules. In essence, by partitioning a transaction database into several partitions, algorithm SWF employs a filtering threshold in each partition to deal with the candidate itemset generation. Under SWF, the cumulative information of mining previous partitions is selectively carried over toward the generation of candidate itemsets for the subsequent partitions. Algorithm SWF not only significantly reduces I/O and CPU cost by the concepts of cumulative filtering and scan reduction techniques but also effectively controls memory utilization by the technique of sliding-window partition. More importantly, algorithm SWF is particularly powerful for efficient incremental mining for an ongoing time-variant transaction database. By utilizing proper scan reduction techniques, only one scan of the incremented dataset is needed by algorithm SWF. The I/O cost of SWF is, in orders of magnitude, smaller than those required by prior methods, thus resolving the performance bottleneck. Extensive experimental studies are performed to evaluate performance of algorithm SWF. Sensitivity analysis of various parameters is conducted to provide many insights into algorithm SWF. It is noted that the improvement achieved by algorithm SWF is even more prominent as the incremented portion of the dataset increases and also as the size of the database increases.