Grouper: a dynamic clustering interface to Web search results
WWW '99 Proceedings of the eighth international conference on World Wide Web
A vector space model for automatic indexing
Communications of the ACM
Information Retrieval
Natural Language Engineering
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
Combining statistics and semantics via ensemble model for document clustering
Proceedings of the 2009 ACM symposium on Applied Computing
Exploiting noun phrases and semantic relationships for text document clustering
Information Sciences: an International Journal
An Integration of Fuzzy Association Rules and WordNet for Document Clustering
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Data & Knowledge Engineering
W-kmeans: clustering news articles using wordNet
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Athena: text mining based discovery of scientific workflows in disperse repositories
RED'10 Proceedings of the Third international conference on Resource Discovery
Wikipedia-based smoothing for enhancing text clustering
AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A clustering technique for news articles using WordNet
Knowledge-Based Systems
Semantic smoothing for text clustering
Knowledge-Based Systems
Knowledge discovery in inspection reports of marine structures
Expert Systems with Applications: An International Journal
Hi-index | 0.01 |
Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include ambiguity and synonymy, the former allowing for erroneous groupings and the latter causing similarities between documents to go unnoticed. In this research, naïve, syntax-based disambiguation is attempted by assigning each word a part-of-speech tag and by enriching the 'bag-of-words' data representation often used for document clustering with synonyms and hypernyms from WordNet.