Cluster structure inference based on clustering stability with applications to microarray data analysis

Authors:
Ciprian Doru Giurcaneanu;Ioan Tabus
Affiliations:
Institute of Signal Processing, Tampere University of Technology, Tampere, Finland;Institute of Signal Processing, Tampere University of Technology, Tampere, Finland
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2004

Citing 3
Cited 3

Tumor classification by gene expression profiling: comparison and validation of five clustering methods

ACM SIGBIO Newsletter
Cluster validation techniques for genome expression data

Signal Processing - Special issue: Genomic signal processing
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning

An Analysis of Research Themes in the CBR Conference Literature

ECCBR '08 Proceedings of the 9th European conference on Advances in Case-Based Reasoning
Data-Fusion in Clustering Microarray Data: Balancing Discovery and Interpretability

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient prediction-based validation for document clustering

ECML'06 Proceedings of the 17th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on the stability-based approach for estimating the number of clusters K in microarray data. The cluster stability approach amounts to performing clustering successively over random subsets of the available data and evaluating an index which expresses the similarity of the successive partitions obtained. We present a method for automatically estimating K by starting from the distribution of the similarity index. We investigate how the selection of the hierarchical clustering (HC) method, respectively, the similarity index, influences the estimation accuracy. The paper introduces a new similarity index based on a partition distance. The performance of the new index and that of other well-known indices are experimentally evaluated by comparing the "true" data partition with the partition obtained at each level of an HC tree. A case study is conducted with a publicly available Leukemia dataset.