Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Documents Using a Wikipedia-Based Concept Representation
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data & Knowledge Engineering
Learning ontology resolution for document representation and its applications in text mining
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Ontology enhancement and concept granularity learning: keeping yourself current and adaptive
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Enriching short text representation in microblog for clustering
Frontiers of Computer Science in China
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Concept chaining utilizing meronyms in text characterization
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Learning a concept-based document similarity measure
Journal of the American Society for Information Science and Technology
International Journal of Web Engineering and Technology
Hi-index | 0.00 |
Text document clustering provides an effective and intuitive navigation mechanism to organize a large amount of retrieval results by grouping documents in a small number of meaningful classes. Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solves both problems by using ANNIE and WordNet lexical categories and WordNet ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well. For the clustering step we have chosen the bisecting k-means and the Multipole tree, a modified version of the Antipole tree data structure for, respectively, their accuracy and speed.