Algorithms for clustering data
Algorithms for clustering data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Fuzzy Sets and Systems
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Density-Based Multiscale Data Condensation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Computer
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Large Datasets in Arbitrary Metric Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Combining Multiple Weak Clusterings
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Ensembles of Partitions via Data Resampling
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Solving cluster ensemble problems by bipartite graph partitioning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Analysis of Consensus Partition in Cluster Ensemble
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Combining Multiple Clusterings Using Evidence Accumulation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Online Clustering Algorithms for Radar Emitter Classification
IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining partitions by probabilistic label aggregation
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Scalable Model-Based Clustering for Large Databases Based on Data Summarization
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Combining Multiple Clusterings by Soft Correspondence
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Fast and Efficient Ensemble Clustering Method
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Distributed clustering based on sampling local density estimates
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Complexity reduction for "large image" processing
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Fast accurate fuzzy clustering through data reduction
IEEE Transactions on Fuzzy Systems
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Combining multiple clusterings using similarity graph
Pattern Recognition
CLICOM: Cliques for combining multiple clusterings
Expert Systems with Applications: An International Journal
An effective ensemble method for hierarchical clustering
Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Optimal clustering in the context of overlapping cluster analysis
Information Sciences: an International Journal
Credit-Card fraud profiling using a hybrid incremental clustering methodology
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
PHA: A fast potential-based hierarchical agglomerative clustering method
Pattern Recognition
An efficient and scalable family of algorithms for combining clusterings
Engineering Applications of Artificial Intelligence
Hi-index | 0.01 |
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard-k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.