An effective hash-based algorithm for mining association rules
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Distributed data mining in a chain store database of short transactions
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-agent Technology for Distributed Data Mining and Classification
IAT '03 Proceedings of the IEEE/WIC International Conference on Intelligent Agent Technology
A new distributed data mining model based on similarity
Proceedings of the 2003 ACM symposium on Applied computing
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery
Frequent Pattern Mining on Message Passing Multiprocessor Systems
Distributed and Parallel Databases
Data Structure for Association Rule Mining: T-Trees and P-Trees
IEEE Transactions on Knowledge and Data Engineering
Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system
Expert Systems with Applications: An International Journal
International Journal of Ad Hoc and Ubiquitous Computing
Efficient algorithms for frequent pattern mining in many-task computing environments
Knowledge-Based Systems
Hi-index | 0.00 |
Mining association rules from a transaction-oriented database is a problem in data mining. Frequent patterns are essential for generating association rules, time series analysis, classification, etc. There are two categories of algorithms for data mining, the generate-and-test approach (Apriori-like) and the pattern growth approach (FP-tree). Recently, many methods have been proposed for solving this problem based on an FP-tree as a replacement for Apriori-like algorithms, because these need to scan the database many times. However, even for the pattern growth method, the execution time takes long when the database is large or the given support is low. Parallel- distributed computing is good strategy for solving this problem. Some parallel algorithms have been proposed, however, the execution time increases rapidly when the database increases or when the given minimum threshold is small. In this study, an efficient parallel- distributed mining algorithm based on an FP-tree structure - the Tidset-based Parallel FP-tree (TPFP-tree) - is proposed. In order to exchange transactions efficiently, transaction identification set (Tidset) was used to directly choose transactions without scanning databases. The algorithm was verified on a Linux cluster with 16 computing nodes. It was also compared with a PFP-tree algorithm. The dataset generated by IBM's Quest Synthetic Data Generator to verify the performance of algorithms was used. The experimental results showed that this algorithm can reduce the execution time when the database grows. Moreover, it was also observed that this algorithm had better scalability than the PFP-tree.