Sparse learning based linear coherent bi-clustering

  • Authors:
  • Yi Shi;Xiaoping Liao;Xinhua Zhang;Guohui Lin;Dale Schuurmans

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada

  • Venue:
  • WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering algorithms are often limited by an assumption that each data point belongs to a single class, and furthermore that all features of a data point are relevant to class determination. Such assumptions are inappropriate in applications such as gene clustering, where, given expression profile data, genes may exhibit similar behaviors only under some, but not all conditions, and genes may participate in more than one functional process and hence belong to multiple groups. Identifying genes that have similar expression patterns in a common subset of conditions is a central problem in gene expression microarray analysis. To overcome the limitations of standard clustering methods for this purpose, Bi-clustering has often been proposed as an alternative approach, where one seeks groups of observations that exhibit similar patterns over a subset of the features. In this paper, we propose a new bi-clustering algorithm for identifying linear-coherent bi-clusters in gene expression data, strictly generalizing the type of bi-cluster structure considered by other methods. Our algorithm is based on recent sparse learning techniques that have gained significant attention in the machine learning research community. In this work, we propose a novel sparse learning based model, SLLB, for solving the linear coherent bi-clustering problem. Experiments on both synthetic data and real gene expression data demonstrate the model is significantly more effective than current bi-clustering algorithms for these problems. The parameter selection problem and the model's usefulness in other machine learning clustering applications are also discussed. The on-line appendix for this paper can be found at http://www.cs.ualberta.ca/~ys3/SLLB.