Stable biclustering of gene expression data with nonnegative matrix factorizations

  • Authors:
  • Liviu Badea;Doina Tilivea

  • Affiliations:
  • AI group, National Institute for Research in Informatics;AI group, National Institute for Research in Informatics

  • Venue:
  • IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although clustering is probably the most frequently used tool for data mining gene expression data, existing clustering approaches face at least one of the following problems in this domain: a huge number of variables (genes) as compared to the number of samples, high noise levels, the inability to naturally deal with overlapping clusters, the instability of the resulting clusters w.r.t. the initialization of the algorithm as well as the difficulty in clustering genes and samples simultaneously. In this paper we show that all of these problems can be elegantly dealt with by using nonnegative matrix factorizations to cluster genes and samples simultaneously while allowing for bicluster overlaps and by employing Positive Tensor Factorization to perform a two-way meta-clustering of the biclusters produced in several different clustering runs (thereby addressing the above-mentioned instability). The application of our approach to a large lung cancer dataset proved computationally tractable and was able to recover the histological classification of the various cancer subtypes represented in the dataset.