TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams

  • Authors:
  • Bei Yang;Houkuan Huang

  • Affiliations:
  • School of Information Engineering, Zhengzhou University, Zhengzhou, China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent itemset mining over data streams becomes a hot topic in data mining and knowledge discovery in recent years, and has been applied to different areas. However, the setting of a minimum support threshold needs some domain knowledge. Itwill bring a lot of difficulties or much burden to users if the support threshold is not set reasonably. It is interesting for users to find top-K frequent itemsets over data streams. In this paper, a dynamical incremental approximate algorithm TOPSIL-Miner is presented to mine top-K significant itemsets in landmark windows. A new data structure, TOPSIL-Tree, is designed to store the potential significant itemsets and other data structures of maximum support list, ordered item list, TOPSET and minimum support list are devised to maintain information about mining results. Moreover, three optimal strategies are exploited to reduce time and space cost of the algorithm: (1) pruning trivial nodes in the current data stream, (2) promoting ining support threshold during mining process adaptively and heuristically, and (3) promoting pruning threshold dynamically. The accuracy of the algorithm is also analyzed. Extensive experiments are performed to evaluate the good effectiveness and the high efficiency and precision of the algorithm.