Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Toward principles for the design of ontologies used for knowledge sharing
International Journal of Human-Computer Studies - Special issue: the role of formal ontology in the information technology
Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval
Document clustering with committees
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation of hierarchical clustering algorithms for document datasets
Proceedings of the eleventh international conference on Information and knowledge management
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Text Clustering Based on Good Aggregations
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Frequent term-based text clustering
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
An Adaptive Meta-Clustering Approach: Combining the Information from Different Clustering Results
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Document clustering by concept factorization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering via adaptive subspace iteration
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Criterion functions for document clustering
Criterion functions for document clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Biomedical ontology improves biomedical literature clustering performance: a comparison study
International Journal of Bioinformatics Research and Applications
Field independent probabilistic model for clustering multi-field documents
Information Processing and Management: an International Journal
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
A probabilistic model for clustering text documents with multiple fields
ECIR'07 Proceedings of the 29th European conference on IR research
Meta-composer: synthesizing online FRBR works from library resources
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization
Information Sciences: an International Journal
Recovering dialect geography from an unaligned comparable corpus
EACL 2012 Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH
Hi-index | 0.00 |
Document clustering has been used for better document retrieval, document browsing, and text mining in digital library. In this paper, we perform a comprehensive comparison study of various document clustering approaches such as three hierarchical methods (single-link, complete-link, and complete link), Bisecting K-means, K-means, and Suffix Tree Clustering in terms of the efficiency, the effectiveness, and the scalability. In addition, we apply a domain ontology to document clustering to investigate if the ontology such as MeSH improves clustering qualify for MEDLINE articles. Because an ontology is a formal, explicit specification of a shared conceptualization for a domain of interest, the use of ontologies is a natural way to solve traditional information retrieval problems such as synonym/hypernym/ hyponym problems. We conducted fairly extensive experiments based on different evaluation metrics such as misclassification index, F-measure, cluster purity, and Entropy on very large article sets from MEDLINE, the largest biomedical digital library in biomedicine.