A new unsupervised method for document clustering by using WordNet lexical and conceptual relations

Authors:
Diego Reforgiato Recupero
Affiliations:
Dipartimento di Matematica e Informatica, Università degli Studi di Catania, Catania, Italy
Venue:
Information Retrieval
Year:
2007

Citing 0
Cited 10

Parsimonious concept modeling

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Clustering Documents Using a Wikipedia-Based Concept Representation

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Data & Knowledge Engineering
Learning ontology resolution for document representation and its applications in text mining

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Ontology enhancement and concept granularity learning: keeping yourself current and adaptive

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Enriching short text representation in microblog for clustering

Frontiers of Computer Science in China
A formal concept analysis-based domain-specific thesaurus and its application in document representation

ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part III
Concept chaining utilizing meronyms in text characterization

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Learning a concept-based document similarity measure

Journal of the American Society for Information Science and Technology
Clustering facilitated web services discovery model based on supervised term weighting and adaptive metric learning

International Journal of Web Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text document clustering provides an effective and intuitive navigation mechanism to organize a large amount of retrieval results by grouping documents in a small number of meaningful classes. Many well-known methods of text clustering make use of a long list of words as vector space which is often unsatisfactory for a couple of reasons: first, it keeps the dimensionality of the data very high, and second, it ignores important relationships between terms like synonyms or antonyms. Our unsupervised method solves both problems by using ANNIE and WordNet lexical categories and WordNet ontology in order to create a well structured document vector space whose low dimensionality allows common clustering algorithms to perform well. For the clustering step we have chosen the bisecting k-means and the Multipole tree, a modified version of the Antipole tree data structure for, respectively, their accuracy and speed.