The bootstrap approach to clustering
Proc. of the NATO Advanced Study Institute on Pattern recognition theory and applications
Algorithms for clustering data
Algorithms for clustering data
Machine Learning
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
A Data-Clustering Algorithm on Distributed Memory Multiprocessors
Revised Papers from Large-Scale Parallel Data Mining, Workshop on Large-Scale Parallel KDD Systems, SIGKDD
Data Resampling for Path Based Clustering
Proceedings of the 24th DAGM Symposium on Pattern Recognition
Path-Based Clustering for Grouping of Smooth Curves and Texture Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Clustering Using Evidence Accumulation
ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Combining Multiple Weak Clusterings
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Ensembles of Partitions via Data Resampling
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
A clustering method based on boosting
Pattern Recognition Letters
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 1 - Volume 01
Combining Multiple Clusterings Using Evidence Accumulation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters
IEEE Transactions on Pattern Analysis and Machine Intelligence
A New Approach to Improve the Vote-Based Classifier Selection
NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 02
NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 02
CCHR: Combination of Classifiers Using Heuristic Retraining
NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 02
Divide & Conquer Classification and Optimization by Genetic Algorithm
ICCIT '08 Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology - Volume 02
Neural Network Ensembles Using Clustering Ensemble and Genetic Algorithm
ICCIT '08 Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology - Volume 02
Characterization and evaluation of similarity measures for pairs of clusterings
Knowledge and Information Systems
Using genetic algorithms for data mining optimization in an educational web-based system
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII
A new multiobjective clustering technique based on the concepts of stability and symmetry
Knowledge and Information Systems
GAC-GEO: a generic agglomerative clustering framework for geo-referenced datasets
Knowledge and Information Systems
Hi-index | 0.00 |
Clustering ensembles combine multiple partitions of data into a single clustering solution of better quality. Inspired by the success of supervised bagging and boosting algorithms, we propose non-adaptive and adaptive resampling schemes for the integration of multiple independent and dependent clusterings. We investigate the effectiveness of bagging techniques, comparing the efficacy of sampling with and without replacement, in conjunction with several consensus algorithms. In our adaptive approach, individual partitions in the ensemble are sequentially generated by clustering specially selected subsamples of the given dataset. The sampling probability for each data point dynamically depends on the consistency of its previous assignments in the ensemble. New subsamples are then drawn to increasingly focus on the problematic regions of the input feature space. A measure of data point clustering consistency is therefore defined to guide this adaptation. Experimental results show improved stability and accuracy for clustering structures obtained via bootstrapping, subsampling, and adaptive techniques. A meaningful consensus partition for an entire set of data points emerges from multiple clusterings of bootstraps and subsamples. Subsamples of small size can reduce computational cost and measurement complexity for many unsupervised data mining tasks with distributed sources of data. This empirical study also compares the performance of adaptive and non-adaptive clustering ensembles using different consensus functions on a number of datasets. By focusing attention on the data points with the least consistent clustering assignments, whether one can better approximate the inter-cluster boundaries or can at least create diversity in boundaries and this results in improving clustering accuracy and convergence speed as a function of the number of partitions in the ensemble. The comparison of adaptive and non-adaptive approaches is a new avenue for research, and this study helps to pave the way for the useful application of distributed data mining methods.