An Efficient Association Rule Mining Algorithm In Distributed Databases

Authors:
Wu Jian;Li Xing Ming
Affiliations:
-;-
Venue:
WKDD '08 Proceedings of the First International Workshop on Knowledge Discovery and Data Mining
Year:
2008

Citing 0
Cited 2

A load-balanced distributed parallel mining algorithm

Expert Systems with Applications: An International Journal
A fine-grained scheduling strategy for improving the performance of parallel frequent itemsets mining

International Journal of Computational Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

unication networks based on data mining. A direct application of sequential algorithms to distributed databases is not effective, because it requires a large amount of communication overhead. In our study, an efficient algorithm, EDMA, is proposed. It minimizes the number of candidate sets and exchange messages by local and global pruning. In local sites, it runs the application based on the improved algorithm-CMatrix, which is used to calculate local support counts. By numbering the global frequent itemsets generated at the end of k-th iteration from 1 to m, the algorithm codes every candidate (k+1)-itemset into a pair of those number formed as-(x,y) to compress the context transmitted and query corresponding support counts in CMatrix. Our solution also reduces the size of average transactions and datasets that leads to reduction of scan time. The performance study shows that EDMA has superior running efficiency, lower communication cost and stronger scalability than direct application of a sequential algorithm in distributed databases.