Biclustering of gene expression data based on related genes and conditions extraction

  • Authors:
  • Dechun Yan;Jiajun Wang

  • Affiliations:
  • School of Electronic and Information Engineering, Soochow University, Suzhou 215006, PR China;School of Electronic and Information Engineering, Soochow University, Suzhou 215006, PR China

  • Venue:
  • Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Biclustering is an important tool to find patterns in a microarray data matrix by simultaneous classification in two dimensions of genes and conditions. Unlike most existed biclustering algorithms where almost all genes and conditions are involved in the clustering process even if they contribute little to a bicluster, we propose to perform the biclustering operation only in related genes and conditions of a given bicluster type. In our algorithm, the gene expression matrix is first partitioned to stable and unstable submatrices in both row and column directions by inspecting the similarity between the row (or column) vector and the full 1s vector, then the related genes and conditions of a given type of biclusters are extracted by inspecting the row or column pairs in the corresponding stable or unstable submatrices, finally the resulted biclusters of any type are obtained by performing clustering analysis in the extracted related genes and conditions. Additionally, a novel strategy for estimating the missing data in the gene expression matrix is also presented based on the James-Stein and kernel estimation principle where the estimation matrix is obtained with the k means algorithm. Experimental results show excellent performance of our algorithm both in missing data estimation and biclustering.