A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE

Authors:
Illhoi Yoo;Xiaohua Hu
Affiliations:
Drexel University, Philadelphia, PA;Drexel University, Philadelphia, PA
Venue:
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Year:
2006

Citing 18
Cited 8

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Toward principles for the design of ontologies used for knowledge sharing

International Journal of Human-Computer Studies - Special issue: the role of formal ontology in the information technology
Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval

Information Retrieval
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text Clustering Based on Good Aggregations

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent term-based text clustering

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering via adaptive subspace iteration

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Criterion functions for document clustering

Criterion functions for document clustering

Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Biomedical ontology improves biomedical literature clustering performance: a comparison study

International Journal of Bioinformatics Research and Applications
Field independent probabilistic model for clustering multi-field documents

Information Processing and Management: an International Journal
Revealing Hidden Community Structures and Identifying Bridges in Complex Networks: An Application to Analyzing Contents of Web Pages for Browsing

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A probabilistic model for clustering text documents with multiple fields

ECIR'07 Proceedings of the 29th European conference on IR research
Meta-composer: synthesizing online FRBR works from library resources

ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization

Information Sciences: an International Journal
Recovering dialect geography from an unaligned comparable corpus

EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH

Quantified Score

Hi-index	0.00

Visualization

Abstract

Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and Entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine.