Lightweight clustering technique for distributed data mining applications

Authors:
Lamine M. Aouad;Nhien-An Le-Khac;Tahar M. Kechadi
Affiliations:
School of Computer Science and Informatics, University College Dublin, Ireland;School of Computer Science and Informatics, University College Dublin, Ireland;School of Computer Science and Informatics, University College Dublin, Ireland
Venue:
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Year:
2007

Citing 12
Cited 1

Data clustering: a review

ACM Computing Surveys (CSUR)
A Fast Parallel Clustering Algorithm for Large Spatial Databases

Data Mining and Knowledge Discovery
A Maximum Variance Cluster Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovery net: towards a grid of knowledge discovery

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable density-based distributed clustering

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Effective and Efficient Distributed Model-Based Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A New Clustering Algorithm Using Message Passing and its Applications in Analyzing Microarray Data

ICMLA '05 Proceedings of the Fourth International Conference on Machine Learning and Applications
Fast and exact out-of-core and distributed k-means clustering

Knowledge and Information Systems
PBIRCH: A Scalable Parallel Clustering algorithm for Incremental Data

IDEAS '06 Proceedings of the 10th International Database Engineering and Applications Symposium
Distributed data mining on grids: services, tools, and applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey of clustering algorithms

IEEE Transactions on Neural Networks

GoSCAN: Decentralized scalable data clustering

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many parallel and distributed clustering algorithms have already been proposed. Most of them are based on the aggregation of local models according to some collected local statistics. In this paper, we propose a lightweight distributed clustering algorithm based on minimum variance increases criterion which requires a very limited communication overhead. We also introduce the notion of distributed perturbation to improve the globally generated clustering. We show that this algorithm improves the quality of the overall clustering and manage to find the real structure and number of clusters of the global dataset.