Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Integration of Cluster Ensemble and Text Summarization for Gene Expression Analysis
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Hi-index | 0.00 |
In this paper, we design and develop a unified system GE-Miner (Gene Expression Miner) to integrate cluster ensemble, text clustering and multi document summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high quality gene cluster. In our text summarization module, given a gene cluster, our Expectation Maximization (EM) based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high quality clusters and provide informative key terms for the gene clusters.