A load-balanced distributed parallel mining algorithm

Authors:
Kun-Ming Yu;Jiayi Zhou;Tzung-Pei Hong;Jia-Ling Zhou
Affiliations:
Department of Computer Science and Information Engineering, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC;Institute of Engineering and Science, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC;Department of Computer Science and Information Engineering, National University of Kaohsiung, 700, Kaohsiung University Rd, Kaohsiung 811, Taiwan, ROC;Department of Information Management, Chung Hua University, 707, Sec. 2, WuFu Rd., HsinChu 300, Taiwan, ROC
Venue:
Expert Systems with Applications: An International Journal
Year:
2010

Citing 13
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Data mining with decision trees and decision rules

Future Generation Computer Systems - Special double issue on data mining
Parallel data mining for association rules on shared-memory multi-processors

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
A fast distributed algorithm for mining association rules

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Parallel data mining for association rules on shared memory systems

Knowledge and Information Systems
Parallel Algorithms for Discovery of Association Rules

Data Mining and Knowledge Discovery
Efficient Mining of Association Rules in Distributed Databases

IEEE Transactions on Knowledge and Data Engineering
Parallel Mining of Association Rules

IEEE Transactions on Knowledge and Data Engineering
Effect of Data Skewness and Workload Balance in Parallel Data Mining

IEEE Transactions on Knowledge and Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

Data Mining and Knowledge Discovery
A Parallel Apriori Algorithm for Frequent Itemsets Mining

SERA '06 Proceedings of the Fourth International Conference on Software Engineering Research, Management and Applications
An Efficient Association Rule Mining Algorithm In Distributed Databases

WKDD '08 Proceedings of the First International Workshop on Knowledge Discovery and Data Mining

A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

International Journal of Computational Science and Engineering
An empirical study on mining sequential patterns in a grid computing environment

Expert Systems with Applications: An International Journal
Apriori-based frequent itemset mining algorithms on MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
SMINER - a platform for data mining based on service-oriented architecture

International Journal of Business Intelligence and Data Mining

Quantified Score

Hi-index	12.05

Visualization

Abstract

Due to the exponential growth in worldwide information, companies have to deal with an ever growing amount of digital information. One of the most important challenges for data mining is quickly and correctly finding the relationship among data. The Apriori algorithm has been the most popular technique in finding frequent patterns. However, when applying this method, a database has to be scanned many times to calculate the counts of a huge number of candidate itemsets. Parallel and distributed computing is an effective strategy for accelerating the mining process. In this paper, the Distributed Parallel Apriori (DPA) algorithm is proposed as a solution to this problem. In the proposed method, metadata are stored in the form of Transaction Identifiers (TIDs), such that only a single scan to the database is needed. The approach also takes the factor of itemset counts into consideration, thus generating a balanced workload among processors and reducing processor idle time. Experiments on a PC cluster with 16 computing nodes are also made to show the performance of the proposed approach and compare it with some other parallel mining algorithms. The experimental results show that the proposed approach outperforms the others, especially while the minimum supports are low.