Collective, Hierarchical Clustering from Distributed, Heterogeneous Data

Authors:
Erik L. Johnson;Hillol Kargupta
Affiliations:
-;-
Venue:
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Year:
1999

Citing 6
Cited 12

Parallel algorithms for hierarchical clustering

Parallel Computing
Inductive Policy: The Pragmatics of Bias Selection

Machine Learning - Special issue on bias evaluation and selection
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD

Multiclassifier Systems: Back to the Future

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Privacy-preserving Distributed Clustering using Generative Models

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A privacy-sensitive approach to distributed clustering

Pattern Recognition Letters - Special issue: Advances in pattern recognition
Effective and Efficient Distributed Model-Based Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Privacy-Preserving Computation of Bayesian Networks on Vertically Partitioned Data

IEEE Transactions on Knowledge and Data Engineering
Distributed data clustering in multi-dimensional peer-to-peer networks

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Multiobjective data clustering

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Approximated clustering of distributed high-dimensional data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Intelligent database distribution on a grid using clustering

AWIC'05 Proceedings of the Third international conference on Advances in Web Intelligence
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal
Distributed data mining patterns and services: an architecture and experiments

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the Collective Hierarchical Clustering (CHC) algorithm for analyzing distributed, heterogeneous data. This algorithm first generates local cluster models and then combines them to generate the global cluster model of the data. The proposed algorithm runs in O(|S|n2) time, with a O(|S|n) space requirement and O(n) communication requirement, where n is the number of elements in the data set and |S| is the number of data sites. This approach shows significant improvement over naive methods with O(n2) communication costs in the case that the entire distance matrix is transmitted and O(nm) communication costs to centralize the data, where m is the total number of features. A specific implementation based on the single link clustering and results comparing its performance with that of a centralized clustering algorithm are presented. An analysis of the algorithm complexity, in terms of overall computation time and communication requirements, is presented.