Nonnegative Decompositions with Resampling for Improving Gene Expression Data Biclustering Stability

Authors:
Liviu Badea;Doina Ţilivea
Affiliations:
National Institute for Research in Informatics, email: badea@ici.ro;National Institute for Research in Informatics, email: badea@ici.ro
Venue:
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Year:
2008

Citing 5
Cited 1

Positive tensor factorization

Pattern Recognition Letters
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Stable biclustering of gene expression data with nonnegative matrix factorizations

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Clustering and metaclustering with nonnegative matrix decompositions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Stability-based validation of bicluster solutions

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The small sample sizes and high dimensionality of gene expression datasets pose significant problems for unsupervised subgroup discovery. While the stability of unidimensional clustering algorithms has been previously addressed, generalizing existing approaches to biclustering has proved extremely difficult. Despite these difficulties, developing a stable biclustering algorithm is essential for analyzing gene expression data, where genes tend to be co-expressed only for subsets of samples, in certain specific biological contexts, so that both gene and sample dimensions have to be taken into account simultaneously. In this paper, we describe an elegant approach for ensuring bicluster stability that combines three ideas. A slight modification of nonnegative matrix factorization that allows intercepts for genes has proved to be superior to other biclustering methods and is used for base-level clustering. A continuous-weight resampling method for samples is employed to generate slight perturbations of the dataset without sacrificing data and a positive tensor factorization is used to extract the biclusters that are common to the various runs. Finally, we present an application to a large colon cancer dataset for which we find 5 stable subclasses.