Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

Authors:
Jiayi Zhou;Kun-Ming Yu
Affiliations:
Institute of Engineering Science, Chung Hua University;Department of Computer Science and Information Engineering, Chung Hua University, Hsinchu, Taiwan
Venue:
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Year:
2008

Citing 8
Cited 3

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distributed data mining in a chain store database of short transactions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-agent Technology for Distributed Data Mining and Classification

IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
A new distributed data mining model based on similarity

Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
Frequent Pattern Mining on Message Passing Multiprocessor Systems

Distributed and Parallel Databases
Data Structure for Association Rule Mining: T-Trees and P-Trees

IEEE Transactions on Knowledge and Data Engineering

Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

Expert Systems with Applications: An International Journal
A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments

International Journal of Ad Hoc and Ubiquitous Computing
Efficient algorithms for frequent pattern mining in many-task computing environments

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating association rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure - the Tidset-based Parallel FP-tree (TPFP-tree) - is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM's Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree.