Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

  • Authors:
  • Jiayi Zhou;Kun-Ming Yu

  • Affiliations:
  • Institute of Engineering Science, Chung Hua University;Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan

  • Venue:
  • GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating association rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure - the Tidset-based Parallel FP-tree (TPFP-tree) - is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM's Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree.