Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system

Authors:
Kun-Ming Yu;Jiayi Zhou
Affiliations:
Department of Computer Science and Information Engineering, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC;Institute of Engineering and Science, Chung Hua University, 707, Section 2, WuFu Road, HsinChu 300, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 16
Cited 5

An effective hash-based algorithm for mining association rules

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distributed data mining on the grid

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Distributed data mining in a chain store database of short transactions

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-agent Technology for Distributed Data Mining and Classification

IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
A new distributed data mining model based on similarity

Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Frequent Pattern Mining on Message Passing Multiprocessor Systems

Distributed and Parallel Databases
Incrementally fast updated frequent pattern trees

Expert Systems with Applications: An International Journal
Towards personalized recommendation by two-step modified Apriori data mining algorithm

Expert Systems with Applications: An International Journal
Data Structure for Association Rule Mining: T-Trees and P-Trees

IEEE Transactions on Knowledge and Data Engineering
Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support

Expert Systems with Applications: An International Journal
The Pre-FUFP algorithm for incremental mining

Expert Systems with Applications: An International Journal
Aggregation of orders in distribution centers using data mining

Expert Systems with Applications: An International Journal
Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters

GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing

Joint congestion control and processor allocation for task scheduling in grid over OBS networks

Expert Systems with Applications: An International Journal
An improved association rules mining method

Expert Systems with Applications: An International Journal
A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

International Journal of Computational Science and Engineering
An empirical study on mining sequential patterns in a grid computing environment

Expert Systems with Applications: An International Journal
Efficient algorithms for frequent pattern mining in many-task computing environments

Knowledge-Based Systems

Quantified Score

Hi-index	12.06

Visualization

Abstract

The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (TPFP-tree) and Balanced Tidset-based Parallel FP-tree (BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.