Semi Supervised Spectral Clustering for Regulatory Module Discovery

  • Authors:
  • Alok Mishra;Duncan Gillies

  • Affiliations:
  • Imperial College, London, UK SW7 2AZ;Imperial College, London, UK SW7 2AZ

  • Venue:
  • DILS '08 Proceedings of the 5th international workshop on Data Integration in the Life Sciences
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel semi-supervised clustering method for the task of gene regulatory module discovery. The technique uses data on dna binding as prior knowledge to guide the process of spectral clustering of microarray experiments. The microarray data from a set of repeat experiments are converted to an affinity, or similarity, matrix using a Gaussian function. We have investigated two methods to determine the optimal Gaussian variance for this purpose. The first method was based on a statistical measure of cluster coherence, and the second on optimising the number of constraints satisfied in the clustering process. The constraints, which were derived from dna-binding data, were used to adjust the affinity matrix to include known gene-gene interactions. Clusters were found using a spectrical clustering algorithm, and validated by using a biological significance score which was the proportion of gene pairs sharing a common transcription factor in the resulting clusters. Our results indicate that our technique can successfully leverage the information available in the dna-binding data. To the best of our knowledge this is a novel formulation for the purpose of gene module discovery.