Integration of cluster ensemble and EM based text mining for microarray gene cluster identification and annotation

  • Authors:
  • Xiaohua Hu;Xiaodan Zhang;Xiaohua Zhou

  • Affiliations:
  • Drexel Univ., Philadelphia, PA;Drexel Univ., Philadelphia, PA;Drexel Univ., Philadelphia, PA

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we design and develop a unified system GE-Miner (Gene Expression Miner) to integrate cluster ensemble, text clustering and multi document summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high quality gene cluster. In our text summarization module, given a gene cluster, our Expectation Maximization (EM) based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high quality clusters and provide informative key terms for the gene clusters.