CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An effective soft clustering approach to mining gene expressions from multi-source databases
AIKED'07 Proceedings of the 6th Conference on 6th WSEAS Int. Conf. on Artificial Intelligence, Knowledge Engineering and Data Bases - Volume 6
Weighted cluster ensembles: Methods and analysis
ACM Transactions on Knowledge Discovery from Data (TKDD)
Nonparametric Bayesian clustering ensembles
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
A novel framework for discovering robust cluster results
DS'06 Proceedings of the 9th international conference on Discovery Science
Heterogeneous clustering ensemble method for combining different cluster results
BioDM'06 Proceedings of the 2006 international conference on Data Mining for Biomedical Applications
Hi-index | 0.00 |
Generating high quality gene clusters and identifyingthe underlying biological mechanism of the gene clusterare the important goals of clustering gene expressionanalysis. To get high quality cluster results, most of thecurrent approaches rely on choosing the best clusteralgorithm whose design biases and assumptions meet theunderlying distribution of the data set. There are twoissues for this approach: (1) usually the underlying datadistribution of the gene expression data sets is unknown,and (2) there are so many clustering algorithmsavailable and it is very challenging to choose the properone. To provide a textual summary of the gene clusters,the most explored approach is the extractive approachthat essentially builds upon techniques borrowed fromthe information retrieval, in which the objective is toprovide terms to be used for query expansion, and not toact as a stand alone summary for the entire documentsets. Another drawback is that the clustering quality andcluster interpretation are treated as two isolatedresearch problems and are studied separately. Butcluster quality and cluster interpretation are closelyrelated and must be addressed in a coherent and unifiedway. It is essential to have relatively high quality clustersfirst, in order to get a correct, informative biologicalexplanation of the gene cluster, otherwise, the biologicalexplanation will be incorrect or misleading, no matterhow good or robust the text summarization technique is.Based on this consideration, we design and develop aunifed system GE-Miner (Gene Expression Miner) toaddress these challenging issues in a principled andgeneral manner by itegrating cluster ensemble and textsymmarization and provide an environment forcomprehensive gene expression data analysis.Experiemental results demonstrate that our system canobtian hugh quality clusters and provide concise andinfomrative textual summary for the gene clusters.