Biclustering of Gene Expression Data by Simulated Annealing

  • Authors:
  • Anupam Chakraborty

  • Affiliations:
  • Indian Institute of Technology Kharagpur, India

  • Venue:
  • HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A bicluster of a gene expression dataset is a subset of genes which exhibit similar expression patterns along a subset of conditions. Biclustering algorithms aim at finding subsets of genes and subsets of conditions, such that a single cellular process is the main contributor to the expression of the gene subset over the condition subset. We believe that the size of biclusters should be small compared to the size of the gene expression data matrix and we have observed that a conceptually simpler way to perform biclustering from gene expression data is to apply standard oneway clustering algorithms to the rows and columns of the data matrix separately and then to combine the results to obtain bicluster seeds. Our algorithm has three steps. First, we generate a set of high quality bicluster seeds. In the second phase, these bicluster seeds are enlarged by adding more genes and conditions using a simulated annealing based technique. In the third phase, we find the p-values of the biclusters produced for statistical validation. Keywords: gene expression data, kmeans clustering, biclustering of expression data, p-value, simulated annealing.