Stability-based validation of clustering solutions

Authors:
Tilman Lange;Volker Roth;Mikio L. Braun;Joachim M. Buhmann
Affiliations:
Swiss Federal Institute of Technology (ETH) Zurich, Institute for Computational Science, CH-8092 Zurich, Switzerland;Swiss Federal Institute of Technology (ETH) Zurich, Institute for Computational Science, CH-8092 Zurich, Switzerland;Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Informatik III, 53117 Bonn, Germany;Swiss Federal Institute of Technology (ETH) Zurich, Institute for Computational Science, CH-8092 Zurich, Switzerland
Venue:
Neural Computation
Year:
2004

Citing 7
Cited 71

Algorithms for clustering data

Algorithms for clustering data
Data clustering and learning

The handbook of brain theory and neural networks
Data clustering: a review

ACM Computing Surveys (CSUR)
Pairwise Data Clustering by Deterministic Annealing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation

How Many Clusters? An Information-Theoretic Perspective

Neural Computation
Model Selection for Unsupervised Learning of Visual Context

International Journal of Computer Vision
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
The uniqueness of a good optimum for K-means

ICML '06 Proceedings of the 23rd international conference on Machine learning
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the information and representation of non-Euclidean pairwise data

Pattern Recognition
Inference and evaluation of the multinomial mixture model for text clustering

Information Processing and Management: an International Journal
Feature-guided clustering of multi-dimensional flow cytometry datasets

Journal of Biomedical Informatics
Robust Image Segmentation Using Resampling and Shape Constraints

IEEE Transactions on Pattern Analysis and Machine Intelligence
A tutorial on spectral clustering

Statistics and Computing
Nonparametric Bayesian Image Segmentation

International Journal of Computer Vision
A density-based cluster validity approach using multi-representatives

Pattern Recognition Letters
A statistical model of cluster stability

Pattern Recognition
Ordering Grids to Identify the Clustering Structure

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Stability Based Sparse LSI/PCA: Incorporating Feature Selection in LSI and PCA

ECML '07 Proceedings of the 18th European conference on Machine Learning
Cluster Stability Assessment Based on Theoretic Information Measures

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
A Slicing-Based Coherence Measure for Clusters of DTI Integral Curves

MICCAI '08 Proceedings of the 11th international conference on Medical Image Computing and Computer-Assisted Intervention - Part I
Identification of association rules between clusters

CSTST '08 Proceedings of the 5th international conference on Soft computing as transdisciplinary science and technology
Multi-assignment clustering for Boolean data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Improving clustering stability with combinatorial MRFs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-core parallelization in Clojure: a case study

Proceedings of the 6th European Lisp Workshop
An Experimental Comparison of Kernel Clustering Methods

Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

Computational Intelligence Methods for Bioinformatics and Biostatistics
Clustering stability-based feature selection for unsupervised texture classification

Machine Graphics & Vision International Journal
Normality-based validation for crisp clustering

Pattern Recognition
A linguistic approach to classification of bacterial genomes

Pattern Recognition
A new separation measure for improving the effectiveness of validity indices

Information Sciences: an International Journal
A stability based validity method for fuzzy clustering

Pattern Recognition
Data-Fusion in Clustering Microarray Data: Balancing Discovery and Interpretability

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cluster validation using information stability measures

Pattern Recognition Letters
Data clustering: 50 years beyond K-means

Pattern Recognition Letters
Discovering significant structures in clustered bio-molecular data through the bernstein inequality

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
A new multiobjective clustering technique based on the concepts of stability and symmetry

Knowledge and Information Systems
Clustering Stability: An Overview

Foundations and Trends® in Machine Learning
Bayesian order-adaptive clustering for video segmentation

EMMCVPR'07 Proceedings of the 6th international conference on Energy minimization methods in computer vision and pattern recognition
Group detection in mobility traces

Proceedings of the 6th International Wireless Communications and Mobile Computing Conference
Applying the possibilistic c-means algorithm in kernel-induced spaces

IEEE Transactions on Fuzzy Systems - Special section on computing with words
Effective framework for prediction of disease outcome using medical datasets: clustering and classification

International Journal of Computational Intelligence Studies
An unsupervised aspect-sentiment model for online reviews

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Stability-based validation of bicluster solutions

Pattern Recognition
Dampster-Shafer evidence theory based multi-characteristics fusion for clustering evaluation

RSKT'10 Proceedings of the 5th international conference on Rough set and knowledge technology
Penalized cluster analysis with applications to family data

Computational Statistics & Data Analysis
PAC-Bayesian Analysis of Co-clustering and Beyond

The Journal of Machine Learning Research
A randomized algorithm for estimating the number of clusters

Automation and Remote Control
Latent clustering on graphs with multiple edge types

WAW'11 Proceedings of the 8th international conference on Algorithms and models for the web graph
The minimum transfer cost principle for model-order selection

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Model order selection for multiple cooperative swarms clustering using stability analysis

Information Sciences: an International Journal
MiniMax ε-stable cluster validity index for Type-2 fuzziness

Information Sciences: an International Journal
Selection of the number of clusters via the bootstrap method

Computational Statistics & Data Analysis
Data integration in multi-dimensional data sets: informational asymmetry in the valid correlation of subdivided samples

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Exploiting low-level image segmentation for object recognition

DAGM'06 Proceedings of the 28th conference on Pattern Recognition
Efficient prediction-based validation for document clustering

ECML'06 Proceedings of the 17th European conference on Machine Learning
Smooth image segmentation by nonparametric bayesian inference

ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part I
A sober look at clustering stability

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Significance and recovery of block structures in binary matrices with noise

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Positional and confidence voting-based consensus functions for fuzzy cluster ensembles

Fuzzy Sets and Systems
Some connectivity based cluster validity indices

Applied Soft Computing
A novel clustering-based approach to schema matching

ADVIS'06 Proceedings of the 4th international conference on Advances in Information Systems
Estimation of the number of clusters using multiple clustering validity indices

MCS'10 Proceedings of the 9th international conference on Multiple Classifier Systems
GANC: Greedy agglomerative normalized cut for graph clustering

Pattern Recognition
An effective unsupervised network anomaly detection method

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Generating realistic online auction data

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
A meta-learning approach for determining the number of clusters with consideration of nearest neighbors

Information Sciences: an International Journal
Integrating cluster formation and cluster evaluation in interactive visual analysis

Proceedings of the 27th Spring Conference on Computer Graphics
Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection

Pattern Recognition
Fuzzy clustering based ET image fusion

Information Fusion
Stability of density-based clustering

The Journal of Machine Learning Research
Self-learning K-means clustering: a global optimization approach

Journal of Global Optimization
How Many Clusters: A Validation Index for Arbitrary-Shaped Clusters

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A non-parametric method to estimate the number of clusters

Computational Statistics & Data Analysis
A binomial noised model for cluster validation

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering describes a set of frequently employed techniques in exploratory data analysis to extract "natural" group structure in data. Such groupings need to be validated to separate the signal in the data from spurious structure. In this context, finding an appropriate number of clusters is a particularly important model selection question. We introduce a measure of cluster stability to assess the validity of a cluster model. This stability measure quantifies the reproducibility of clustering solutions on a second sample, and it can be interpreted as a classification risk with regard to class labels produced by a clustering algorithm. The preferred number of clusters is determined by minimizing this classification risk as a function of the number of clusters. Convincing results are achieved on simulated as well as gene expression data sets. Comparisons to other methods demonstrate the competitive performance of our method and its suitability as a general validation tool for clustering solutions in real-world problems.