Discovering significant structures in clustered bio-molecular data through the bernstein inequality

Authors:
Alberto Bertoni;Giorgio Valentini
Affiliations:
DSI, Dipartimento di Scienze dell' Informazione, Università degli Studi di Milano, Milano, Italia;DSI, Dipartimento di Scienze dell' Informazione, Università degli Studi di Milano, Milano, Italia
Venue:
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Year:
2007

Citing 7
Cited 1

Data clustering: a review

ACM Computing Surveys (CSUR)
Database-friendly random projections

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Stability-based validation of clustering solutions

Neural Computation
Computational cluster validation in post-genomic data analysis

Bioinformatics
Mosclust: a software library for discovering significant structures in bio-molecular data

Bioinformatics
Randomized maps for assessing the reliability of patients clusters in DNA microarray data analyses

Artificial Intelligence in Medicine

Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data

Computational Intelligence Methods for Bioinformatics and Biostatistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching for structures in complex bio-molecular data is a central issue in several branches of bioinformatics. In particular, the reliability of clusters discovered by a given clustering algorithm have been recently assessed through methods based on the concept of stability with respect to random perturbations of the data. In this context, a major problem is to assess the confidence of the measures of reliability. We discuss a partially "distribution independent" method based on the classical Bernstein inequality to assess the statistical significance of the discovered clusterings. Experimental results with gene expression data show the effectiveness of the proposed approach.