A scalable framework for cluster ensembles

Authors:
Prodip Hore;Lawrence O. Hall;Dmitry B. Goldgof
Affiliations:
Department of Computer Science and Engineering, ENB118 University of South Florida, Tampa, FL 33620, USA;Department of Computer Science and Engineering, ENB118 University of South Florida, Tampa, FL 33620, USA;Department of Computer Science and Engineering, ENB118 University of South Florida, Tampa, FL 33620, USA
Venue:
Pattern Recognition
Year:
2009

Citing 29
Cited 13

Algorithms for clustering data

Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Fast fuzzy clustering

Fuzzy Sets and Systems
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Density-Based Multiscale Data Condensation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining Very Large Databases

Computer
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Large Datasets in Arbitrary Metric Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Ensembles of Partitions via Data Resampling

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Online Clustering Algorithms for Radar Emitter Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining partitions by probabilistic label aggregation

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Scalable Model-Based Clustering for Large Databases Based on Data Summarization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective and Efficient Distributed Model-Based Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Combining Multiple Clusterings by Soft Correspondence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
On Weighting Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Fast and Efficient Ensemble Clustering Method

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Distributed clustering based on sampling local density estimates

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Complexity reduction for "large image" processing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fast accurate fuzzy clustering through data reduction

IEEE Transactions on Fuzzy Systems

Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Combining multiple clusterings using similarity graph

Pattern Recognition
Review Article: Multi-objective nature-inspired clustering and classification techniques for image segmentation

Applied Soft Computing
CLICOM: Cliques for combining multiple clusterings

Expert Systems with Applications: An International Journal
Coevolutionary learning of neural network ensemble for complex classification tasks

Pattern Recognition
Multi-objective design of hierarchical consensus functions for clustering ensembles via genetic programming

Decision Support Systems
An effective ensemble method for hierarchical clustering

Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Optimal clustering in the context of overlapping cluster analysis

Information Sciences: an International Journal
Credit-Card fraud profiling using a hybrid incremental clustering methodology

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
PHA: A fast potential-based hierarchical agglomerative clustering method

Pattern Recognition
An efficient and scalable family of algorithms for combining clusterings

Engineering Applications of Artificial Intelligence
Agreement-based fuzzy C-means for clustering data with blocks of features

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.