Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources
IEEE Transactions on Knowledge and Data Engineering
Principle-based parsing without overgeneration
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Semantic similarity methods in wordNet and their application to information retrieval on the web
Proceedings of the 7th annual ACM international workshop on Web information and data management
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Semantic Smoothing for Model-based Document Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Measures of semantic similarity and relatedness in the biomedical domain
Journal of Biomedical Informatics
Perspectives on ontology-based querying: Research Articles
International Journal of Intelligent Systems
Representation and dimensionality reduction of semantically enriched clickstreams
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
An ontology-based cluster analysis framework
OBI '08 Proceedings of the first international workshop on Ontology-supported business intelligence
Exploiting Wikipedia as external knowledge for document clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Integrated Recommender Systems Based on Ontology and Usage Mining
AMT '09 Proceedings of the 5th International Conference on Active Media Technology
Agregação inteligente de RSS utilizando uma taxonomia construída colaborativamente
Companion Proceedings of the XIV Brazilian Symposium on Multimedia and the Web
Frequent itemset based hierarchical document clustering using Wikipedia as external knowledge
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
An adaptive ontology-based approach to identify correlation between publications
Proceedings of the 20th international conference companion on World wide web
A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity
Engineering Applications of Artificial Intelligence
Interest logic and its application on the web
KSEM'11 Proceedings of the 5th international conference on Knowledge Science, Engineering and Management
A wikipedia based semantic graph model for topic tracking in blogosphere
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Clustering and understanding documents via discrimination information maximization
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A ranking algorithm integrating vector space model with semantic metadata
Proceedings of the CUBE International Information Technology Conference
Improving context-based medical image retrieval by incorporating semantic-based retrieval
Proceedings of the 4th International Conference on Internet Multimedia Computing and Service
The impact of conceptualization on text classification
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
An Ontology Based Model for Document Clustering
International Journal of Intelligent Information Technologies
Conceptualization Effects on MEDLINE Documents Classification Using Rocchio Method
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
A semantic social network-based expert recommender system
Applied Intelligence
Hi-index | 0.00 |
Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this paper, we conduct a comparative study on how different semantic similarity measures of term including path based similarity measure, information content based similarity measure and feature based similarity measure affect document clustering. We evaluate term re-weighting as an important method to integrate domain ontology to clustering process. Meanwhile, we apply k-means clustering on one real-world text dataset, our own corpus generated from PubMed. Experiment results on 8 different semantic measures have shown that: (1) there is no a certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.