The three steps of clustering in the post-genomic era: a synopsis

Authors:
R. Giancarlo;G. Lo Bosco;L. Pinello;F. Utro
Affiliations:
Dipartimento di Matematica ed Informatica, Universitá di Palermo, Palermo, Italy;Dipartimento di Matematica ed Informatica, Universitá di Palermo, Palermo, Italy;Dipartimento di Matematica ed Informatica, Universitá di Palermo, Palermo, Italy;Computational Genomics Group, IBM T.J. Watson Research Center, Yorktown, NY
Venue:
CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Year:
2010

Citing 9
Cited 1

Algorithms for clustering data

Algorithms for clustering data
Data clustering: a review

ACM Computing Surveys (CSUR)
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Cluster analysis of gene expression data

Cluster analysis of gene expression data
Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces

Machine Learning
Computational cluster validation in post-genomic data analysis

Bioinformatics
A fast k-means implementation using coresets

Proceedings of the twenty-second annual symposium on Computational geometry
An optimal hierarchical clustering algorithm for gene expression data

Information Processing Letters
Distance functions, clustering algorithms and microarray data analysis

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization

Proximity Measures for Clustering Gene Expression Microarray Data: A Validation Methodology and a Comparative Analysis

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. Following Handl et al., it can be summarized as a three step process: (a) choice of a distance function; (b) choice of a clustering algorithm; (c) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Unfortunately, the high dimensionality of the data and their noisy nature makes cluster analysis of genomic data particularly difficult. This paper highlights new findings that seem to address a few relevant problems in each of the three mentioned steps, both in regard to the intrinsic predictive power of methods and algorithms and their time performance. Inclusion of this latter aspect into the evaluation process is quite novel, since it is hardly considered in genomic data analysis.