Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Clustering gene expression patterns
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
ACM Computing Surveys (CSUR)
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Pattern Recognition with Fuzzy Objective Function Algorithms
Pattern Recognition with Fuzzy Objective Function Algorithms
Using the Co-occurrence of Words for Retrieval Weighting
Information Retrieval
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
KPSpotter: a flexible information gain-based keyphrase extraction system
WIDM '03 Proceedings of the 5th ACM international workshop on Web information and data management
From cluster ensemble to structure ensemble
Information Sciences: an International Journal
Hi-index | 0.00 |
Generating high-quality gene clusters and identifying the underlying biological mechanism of the gene clusters are the important goals of clustering gene expression analysis. To get high-quality cluster results, most of the current approaches rely on choosing the best cluster algorithm, in which the design biases and assumptions meet the underlying distribution of the dataset. There are two issues for this approach: 1) usually, the underlying data distribution of the gene expression datasets is unknown and 2) there are so many clustering algorithms available and it is very challenging to choose the proper one. To provide a textual summary of the gene clusters, the most explored approach is the extractive approach that essentially builds upon techniques borrowed from the information retrieval, in which the objective is to provide terms to be used for query expansion, and not to act as a stand-alone summary for the entire document sets. Another drawback is that the clustering quality and cluster interpretation are treated as two isolated research problems and are studied separately. In this paper, we design and develop a unified system Gene Expression Miner to address these challenging issues in a principled and general manner by integrating cluster ensemble, text clustering, and multidocument summarization and provide an environment for comprehensive gene expression data analysis. We present a novel cluster ensemble approach to generate high-quality gene cluster. In our text summarization module, given a gene cluster, our expectation-maximization based algorithm can automatically identify subtopics and extract most probable terms for each topic. Then, the extracted top k topical terms from each subtopic are combined to form the biological explanation of each gene cluster. Experimental results demonstrate that our system can obtain high-quality clusters and provide informative key terms for the gene clusters.