Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization

Authors:
Xiaodi Huang;Xiaodong Zheng;Wei Yuan;Fei Wang;Shanfeng Zhu
Affiliations:
School of Computing and Mathematics, Charles Sturt University, Albury, NSW 2640, Australia and State Key Lab of Software Engineering, Wuhan University, Wuhan 430072, China;The School of Computer Science, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China;The School of Computer Science, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China;The School of Computer Science, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China;The School of Computer Science, Fudan University, Shanghai 200433, China and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai 200433, China and State Key Lab of S ...
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 18
Cited 6

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Data clustering: a review

ACM Computing Surveys (CSUR)
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing

Communications of the ACM
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
A unified framework for model-based clustering

The Journal of Machine Learning Research
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Generative model-based document clustering: a comparative study

Knowledge and Information Systems
A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Inferential, robust non-negative matrix factorization analysis of microarray data

Bioinformatics
Projected Gradient Methods for Nonnegative Matrix Factorization

Neural Computation
Ensemble non-negative matrix factorization methods for clustering protein–protein interactions

Bioinformatics
Clustering complex networks and biological networks by nonnegative matrix factorization with various similarity measures

Neurocomputing
Exploiting noun phrases and semantic relationships for text document clustering

Information Sciences: an International Journal
Ensemble document clustering using weighted hypergraph generated by NMF

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal

From cluster ensemble to structure ensemble

Information Sciences: an International Journal
Clustering via geometric median shift over Riemannian manifolds

Information Sciences: an International Journal
Evaluation of a perturbation-based technique for privacy preservation in a multi-party clustering scenario

Information Sciences: an International Journal
On Knowledge-Enhanced Document Clustering

International Journal of Information Retrieval Research
Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation

Expert Systems with Applications: An International Journal
Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.08

Visualization

Abstract

Searching and mining biomedical literature databases are common ways of generating scientific hypotheses by biomedical researchers. Clustering can assist researchers to form hypotheses by seeking valuable information from grouped documents effectively. Although a large number of clustering algorithms are available, this paper attempts to answer the question as to which algorithm is best suited to accurately cluster biomedical documents. Non-negative matrix factorization (NMF) has been widely applied to clustering general text documents. However, the clustering results are sensitive to the initial values of the parameters of NMF. In order to overcome this drawback, we present the ensemble NMF for clustering biomedical documents in this paper. The performance of ensemble NMF was evaluated on numerous datasets generated from the TREC Genomics track dataset. With respect to most datasets, the experimental results have demonstrated that the ensemble NMF significantly outperforms classical clustering algorithms of bisecting K-means, and hierarchical clustering. We compared four different methods for constructing an ensemble NMF. For clustering biomedical documents, this research is the first to compare ensemble NMF with typical classical clustering algorithms, and validates ensemble NMF constructed from different graph-based ensemble algorithms. This is also the first work on ensemble NMF with Hybrid Bipartite Graph Formulation for clustering biomedical documents.