A knowledge-driven method to evaluate multi-source clustering

  • Authors:
  • Chengyong Yang;Erliang Zeng;Tao Li;Giri Narasimhan

  • Affiliations:
  • Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, FL;Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, FL;Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, FL;Bioinformatics Research Group (BioRG), School of Computer Science, Florida International University, Miami, FL

  • Venue:
  • ISPA'05 Proceedings of the 2005 international conference on Parallel and Distributed Processing and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent research demonstrated that biological literature can complement the information extracted from gene expression data to obtain better gene clusters. The Multi-Source Clustering (MSC) algorithm, which was recently proposed by the authors, performs semantic integration of information obtained from gene expression data and biomedical text literature. To address the challenge of evaluating clustering results, a new knowledge-driven approach is proposed based on information extracted from a database of published binding sites of known transcription factors (TF). We propose the use of a measure called C-index for an objective, quantitative evaluation. We compare the results of algorithm MSC for the integrated data sources with the results obtained (a) & (b) by clustering applied to the two sources of data separately, and (c) by clustering after using a feature-level integration. We show that the C-index measurements of the clustering results from MSC are better than that from the other three approaches.