CLAIM: an efficient method for relaxed frequent closed itemsets mining over stream data

  • Authors:
  • Guojie Song;Dongqing Yang;Bin Cui;Baihua Zheng;Yunfeng Liu;Kunqing Xie

  • Affiliations:
  • School of Electronic Engineering and Computer Science, Peking University, Beijing, China and National Laboratory on Machine Perception, Peking University, Beijing;School of Electronic Engineering and Computer Science, Peking University, Beijing, China;School of Electronic Engineering and Computer Science, Peking University, Beijing, China;School of Information System, Singapore Management University, Singapore;Computer Center of Peking University, Beijing;National Laboratory on Machine Perception, Peking University, Beijing

  • Venue:
  • DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recently, frequent itemsets mining over data streams attracted much attention. However, mining closed itemsets from data stream has not been well addressed. The main difficulty lies in its high complexity of maintenance aroused by the exact model definition of closed itemsets and the dynamic changing of data streams. In data stream scenario, it is sufficient to mining only approximated frequent closed itemsets instead of in full precision. Such a compact but close-enough frequent itemset is called a relaxed frequent closed itemsets. In this paper, we first introduce the concept of RC (Relaxed frequent Closed Itemsets), which is the generalized form of approximation. We also propose a novel mechanism CLAIM, which stands for CLosed Approximated Itemset Mining, to support efficiently mining of RC. The CLAIM adopts bipartite graph model to store frequent closed itemsets, use Bloom filter based hash function to speed up the update of drifted itemsets, and build a compact HR-tree structure to efficiently maintain the RCs and support mining process. An experimental study is conducted, and the results demonstrate the effectiveness and efficiency of our approach at handling frequent closed itemsets mining for data stream.