Parallel algorithms for hierarchical clustering
Parallel Computing
Inductive Policy: The Pragmatics of Bias Selection
Machine Learning - Special issue on bias evaluation and selection
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Multiclassifier Systems: Back to the Future
MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Combining Multiple Weak Clusterings
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A privacy-sensitive approach to distributed clustering
Pattern Recognition Letters - Special issue: Advances in pattern recognition
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data
IEEE Transactions on Knowledge and Data Engineering
Distributed data clustering in multi-dimensional peer-to-peer networks
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Multiobjective data clustering
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Approximated clustering of distributed high-dimensional data
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Intelligent database distribution on a grid using clustering
AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Clustering distributed data streams in peer-to-peer environments
Information Sciences: an International Journal
Distributed data mining patterns and services: an architecture and experiments
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
This paper presents the Collective Hierarchical Clustering (CHC) algorithm for analyzing distributed, heterogeneous data. This algorithm first generates local cluster models and then combines them to generate the global cluster model of the data. The proposed algorithm runs in O(|S|n2) time, with a O(|S|n) space requirement and O(n) communication requirement, where n is the number of elements in the data set and |S| is the number of data sites. This approach shows significant improvement over naive methods with O(n2) communication costs in the case that the entire distance matrix is transmitted and O(nm) communication costs to centralize the data, where m is the total number of features. A specific implementation based on the single link clustering and results comparing its performance with that of a centralized clustering algorithm are presented. An analysis of the algorithm complexity, in terms of overall computation time and communication requirements, is presented.