Distributed data clustering can be efficient and exact
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
On Clustering Validation Techniques
Journal of Intelligent Information Systems
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Scalable density-based distributed clustering
PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
A new unsupervised approach for fuzzy clustering
Fuzzy Sets and Systems
A scalable framework for cluster ensembles
Pattern Recognition
Lightweight clustering technique for distributed data mining applications
ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Ensemble learning based distributed clustering
PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Approximate pairwise clustering for large data sets via sampling plus extension
Pattern Recognition
ACM Transactions on Knowledge Discovery from Data (TKDD)
Objective function-based clustering
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Robust estimation of a global Gaussian mixture by decentralized aggregations of local models
Web Intelligence and Agent Systems
Hi-index | 0.00 |
In many companies data is distributed among several sites, i.e. each site generates its own data and manages its own data repository. Analyzing and mining these distributed sources requires distributed data mining techniques to find global patterns representing the complete information. The transmission of the entire local data set is often unacceptable because of performance considerations, privacy and security aspects, and bandwidth constraints. Traditional data mining algorithms, demanding access to complete data, are not appropriate for distributed applications. Thus, there is a need for distributed data mining algorithms in order to analyze and discover new knowledge in distributed environments. One of the most important data mining tasks is clustering which aims at detecting groups of similar data objects. In this paper, we propose a distributed model-based clustering algorithm that uses EM for detecting local models in terms of mixtures of Gaussian distributions. We propose an efficient and effective algorithm for deriving and merging these local Gaussian distributions to generate a meaningful global model. In a broad experimental evaluation we show that our framework is scalable in a highly distributed environment.