Distributed mining of maximal frequent itemsets from databases on a cluster of workstations

  • Authors:
  • S. M. Chung;C. Luo

  • Affiliations:
  • Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA;Dept. of Comput. Sci. & Eng., Wright State Univ., Dayton, OH, USA

  • Venue:
  • CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a new algorithm, named Distributed Max-Miner (DMM), for mining maximal frequent itemsets from databases. A frequent itemset is maximal if none of its supersets is frequent. DMM requires very low communication and synchronization overhead in distributed computing systems. DMM has the local mining phase and the global mining phase. During the local mining phase, each node mines the local database to discover the local maximal frequent itemsets, then they form a set of maximal candidate itemsets for the top-down search in the subsequent global mining phase. A new prefix-tree data structure is developed to facilitate the storage and counting of the global candidate itemsets of different sizes. This global mining phase using the prefix-tree can work with any local mining algorithm. We implemented DMM on a cluster of workstations and evaluated its performance for various cases. DMM demonstrates better performance than other sequential and parallel algorithms, and its performance is quite scalable, even when there are large maximal frequent itemsets (i.e., long patterns) in databases.