Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables
Machine Learning - Special issue on learning with probabilistic representations
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
ACM Computing Surveys (CSUR)
Database-friendly random projections
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Random projection in dimensionality reduction: applications to image and text data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Algorithmic Applications of Low-Distortion Geometric Embeddings
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Analysis of variance components in gene expression data
Bioinformatics
Bioinformatics and Computational Biology Solutions Using R and Bioconductor (Statistics for Biology and Health)
Artificial Intelligence in Medicine
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Moderate diversity for better cluster ensembles
Information Fusion
Artificial Intelligence in Medicine
Fuzzy cluster analysis of high-field functional MRI data
Artificial Intelligence in Medicine
Artificial Intelligence in Medicine
Fuzzy Ensemble Clustering for DNA Microarray Data Analysis
WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
An Algorithm to Assess the Reliability of Hierarchical Clusters in Gene Expression Data
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis
Artificial Intelligence in Medicine
Classification of DNA microarray data with Random Projection Ensembles of Polynomial SVMs
Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
Unsupervised Stability-Based Ensembles to Discover Reliable Structures in Complex Bio-molecular Data
Computational Intelligence Methods for Bioinformatics and Biostatistics
A stability-based algorithm to validate hierarchical clusters of genes
International Journal of Knowledge Engineering and Soft Data Paradigms
Discovering significant structures in clustered bio-molecular data through the bernstein inequality
KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part III
Artificial Intelligence in Medicine
From cluster ensemble to structure ensemble
Information Sciences: an International Journal
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.02 |
Objective:: Clustering algorithms may be applied to the analysis of DNA microarray data to identify novel subgroups that may lead to new taxonomies of diseases defined at bio-molecular level. A major problem related to the identification of biologically meaningful clusters is the assessment of their reliability, since clustering algorithms may find clusters even if no structure is present. Methodology:: Recently, methods based on random ''perturbations'' of the data, such as bootstrapping, noise injections techniques and random subspace methods have been applied to the problem of cluster validity estimation. In this framework, we propose stability measures that exploits the high dimensionality of DNA microarray data and the redundancy of information stored in microarray chips. To this end we randomly project the original gene expression data into lower dimensional subspaces, approximately preserving the distance between the examples according to the Johnson-Lindenstrauss (JL) theory. The stability of the clusters discovered in the original high dimensional space is estimated by comparing them with the clusters discovered in randomly projected lower dimensional subspaces. The proposed cluster-stability measures may be applied to validate and to quantitatively assess the reliability of the clusters obtained by a large class of clustering algorithms. Results and conclusion:: We tested the effectiveness of our approach with high dimensional synthetic data, whose distribution is a priori known, showing that the stability measures based on randomized maps correctly predict the number of clusters and the reliability of each individual cluster. Then we showed how to apply the proposed measures to the analysis of DNA microarray data, whose underlying distribution is unknown. We evaluated the validity of clusters discovered by hierarchical clustering algorithms in diffuse large B-cell lymphoma (DLBCL) and malignant melanoma patients, showing that the proposed reliability measures can support bio-medical researchers in the identification of stable clusters of patients and in the discovery of new subtypes of diseases characterized at bio-molecular level.