Integration of Cluster Ensemble and Text Summarization for Gene Expression Analysis

Authors:
Affiliations:
Venue:
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Year:
2004

Citing 0
Cited 6

Integration of cluster ensemble and EM based text mining for microarray gene cluster identification and annotation

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An effective soft clustering approach to mining gene expressions from multi-source databases

AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)
Nonparametric Bayesian clustering ensembles

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
A novel framework for discovering robust cluster results

DS'06 Proceedings of the 9th international conference on Discovery Science
Heterogeneous clustering ensemble method for combining different cluster results

BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Generating high quality gene clusters and identifyingthe underlying biological mechanism of the gene clusterare the important goals of clustering gene expressionanalysis. To get high quality cluster results, most of thecurrent approaches rely on choosing the best clusteralgorithm whose design biases and assumptions meet theunderlying distribution of the data set. There are twoissues for this approach: (1) usually the underlying datadistribution of the gene expression data sets is unknown,and (2) there are so many clustering algorithmsavailable and it is very challenging to choose the properone. To provide a textual summary of the gene clusters,the most explored approach is the extractive approachthat essentially builds upon techniques borrowed fromthe information retrieval, in which the objective is toprovide terms to be used for query expansion, and not toact as a stand alone summary for the entire documentsets. Another drawback is that the clustering quality andcluster interpretation are treated as two isolatedresearch problems and are studied separately. Butcluster quality and cluster interpretation are closelyrelated and must be addressed in a coherent and unifiedway. It is essential to have relatively high quality clustersfirst, in order to get a correct, informative biologicalexplanation of the gene cluster, otherwise, the biologicalexplanation will be incorrect or misleading, no matterhow good or robust the text summarization technique is.Based on this consideration, we design and develop aunifed system GE-Miner (Gene Expression Miner) toaddress these challenging issues in a principled andgeneral manner by itegrating cluster ensemble and textsymmarization and provide an environment forcomprehensive gene expression data analysis.Experiemental results demonstrate that our system canobtian hugh quality clusters and provide concise andinfomrative textual summary for the gene clusters.