Integration of Cluster Ensemble and Text Summarization for Gene Expression Analysis

  • Authors:
  • Affiliations:
  • Venue:
  • BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Generating high quality gene clusters and identifyingthe underlying biological mechanism of the gene clusterare the important goals of clustering gene expressionanalysis. To get high quality cluster results, most of thecurrent approaches rely on choosing the best clusteralgorithm whose design biases and assumptions meet theunderlying distribution of the data set. There are twoissues for this approach: (1) usually the underlying datadistribution of the gene expression data sets is unknown,and (2) there are so many clustering algorithmsavailable and it is very challenging to choose the properone. To provide a textual summary of the gene clusters,the most explored approach is the extractive approachthat essentially builds upon techniques borrowed fromthe information retrieval, in which the objective is toprovide terms to be used for query expansion, and not toact as a stand alone summary for the entire documentsets. Another drawback is that the clustering quality andcluster interpretation are treated as two isolatedresearch problems and are studied separately. Butcluster quality and cluster interpretation are closelyrelated and must be addressed in a coherent and unifiedway. It is essential to have relatively high quality clustersfirst, in order to get a correct, informative biologicalexplanation of the gene cluster, otherwise, the biologicalexplanation will be incorrect or misleading, no matterhow good or robust the text summarization technique is.Based on this consideration, we design and develop aunifed system GE-Miner (Gene Expression Miner) toaddress these challenging issues in a principled andgeneral manner by itegrating cluster ensemble and textsymmarization and provide an environment forcomprehensive gene expression data analysis.Experiemental results demonstrate that our system canobtian hugh quality clusters and provide concise andinfomrative textual summary for the gene clusters.