Parallel Hierarchical Clustering on Market Basket Data

Authors:
Baoying Wang;Qin Ding;Imad Rahal
Affiliations:
-;-;-
Venue:
ICDMW '08 Proceedings of the 2008 IEEE International Conference on Data Mining Workshops
Year:
2008

Citing 0
Cited 1

Clustering performance data efficiently at massive scales

Proceedings of the 24th ACM International Conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering has been proven to be a promising data mining technique. Recently, there have been many attempts for clustering market-basket data. In this paper, we propose a parallelized hierarchical clustering approach on market-basket data (PH-Clustering), which is implemented using MPI. Based on the analysis of the major clustering steps, we adopt a partial local and partial global approach to decrease the computation time meanwhile keeping communication time at minimum. Load balance issue is always considered especially at data partitioning stage. Our experimental results demonstrate that PH-Clustering speeds up the sequential clustering with a great magnitude. The larger the data size, the more significant the speedup when the number of processors is large. Our results also show that the number of items has more impact on the performance of PH-Clustering than the number of transactions.