An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distributed data mining on the grid
Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Distributed data mining in a chain store database of short transactions
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-agent Technology for Distributed Data Mining and Classification
IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
A new distributed data mining model based on similarity
Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Frequent Pattern Mining on Message Passing Multiprocessor Systems
Distributed and Parallel Databases
Incrementally fast updated frequent pattern trees
Expert Systems with Applications: An International Journal
Towards personalized recommendation by two-step modified Apriori data mining algorithm
Expert Systems with Applications: An International Journal
Data Structure for Association Rule Mining: T-Trees and P-Trees
IEEE Transactions on Knowledge and Data Engineering
Expert Systems with Applications: An International Journal
The Pre-FUFP algorithm for incremental mining
Expert Systems with Applications: An International Journal
Aggregation of orders in distribution centers using data mining
Expert Systems with Applications: An International Journal
Tidset-based parallel FP-tree algorithm for the frequent pattern mining problem on PC clusters
GPC'08 Proceedings of the 3rd international conference on Advances in grid and pervasive computing
Joint congestion control and processor allocation for task scheduling in grid over OBS networks
Expert Systems with Applications: An International Journal
An improved association rules mining method
Expert Systems with Applications: An International Journal
International Journal of Computational Science and Engineering
An empirical study on mining sequential patterns in a grid computing environment
Expert Systems with Applications: An International Journal
Efficient algorithms for frequent pattern mining in many-task computing environments
Knowledge-Based Systems
Hi-index | 12.06 |
The mining of frequent patterns from transaction-oriented databases is an important subject. Frequent patterns are fundamental in generating association rules, time series, etc. Most frequent pattern mining algorithms can be classified into two categories: generate-and-test approach (Apriori-like) and pattern growth approach (FP-tree). In recent years, many techniques have been proposed for frequent pattern mining based on the FP-tree approach since it only needs two database scans. However, for pattern growth methods, the execution time increases rapidly when the database size increases or when the given support is small. Therefore, parallel-distributed computing is a good strategy for solving this problem. Some parallel algorithms have been proposed, but the execution time is still costly when the database size is large. In this paper, two parallel mining algorithms are proposed; Tidset-based Parallel FP-tree (TPFP-tree) and Balanced Tidset-based Parallel FP-tree (BTP-tree) for frequent pattern mining on PC Clusters and multi-cluster grids. In order to exchange transactions efficiently, a transaction identification set (Tidset) was used to directly select transactions instead of scanning the database. Since a Grid system is a heterogeneous computing environment, the proposed BTP-tree can balance the loading according to the computing ability of the processors. BTP-tree, TPFP-tree and PFP-tree were implemented, and datasets generated with an IBM Quest Synthetic Data Generator were used to verify the performance of TPFP-tree and BTP-tree. The experimental results showed that the TPFP-tree needed less execution time on a PC Cluster than the PFP-tree when the database increased. Moreover, the BTP-tree shortened the execution time significantly and had a better load balance capability than both the TPFP-tree and PFP-tree on a multi-cluster grid.