Scalable parallel data mining for association rules
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Parallel mining algorithms for generalized association rules with classification hierarchy
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A fast distributed algorithm for mining association rules
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Efficient Mining of Association Rules in Distributed Databases
IEEE Transactions on Knowledge and Data Engineering
Discovery of Multiple-Level Association Rules from Large Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Parallel Data Mining Experimentation Using Flexible Configurations
TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Hi-index | 0.00 |
One of the most important problems in data mining is discovery of association rules in large database. In our previous study, we proposed parallel algorithms and candidate duplication based load balancing strategies for mining generalized association rules and showed our algorithms could attain good performance on 16 nodes parallel computer system. However, as the number of nodes increase, it would be difficult to achieve flat workload distribution. In this paper, we present the candidate partition based load balancing strategy for parallel algorithm of generalized association rule mining. This strategy partitions the candidate itemsets so that the number of candidate probes for each node is equalized each other with estimated support count by the information of previous pass. Moreover, we implement the parallel algorithms and load balancing strategies for mining generalized association rules on a cluster of 100 PCs interconnected with an ATM network, and analyze the performance using a large amount of transaction dataset. Through the several experiments, we showed the load balancing strategy, which partition the candidate itemsets with considering the distribution of candidate probes and duplicate the frequently occurring candidate itemsets, can attain high performance and achieve good workload distribution on one hundred PC cluster system.