WordNet-based text document clustering

Authors:
Julian Sedding;Dimitar Kazakov
Affiliations:
University of York, Heslington, York, United Kingdom;University of York, Heslington, York, United Kingdom
Venue:
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Year:
2004

Citing 4
Cited 14

Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
A vector space model for automatic indexing

Communications of the ACM
Information Retrieval

Information Retrieval
Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation

Natural Language Engineering

Text document clustering based on frequent word meaning sequences

Data & Knowledge Engineering
Combining statistics and semantics via ensemble model for document clustering

Proceedings of the 2009 ACM symposium on Applied Computing
Exploiting noun phrases and semantic relationships for text document clustering

Information Sciences: an International Journal
An Integration of Fuzzy Association Rules and WordNet for Document Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
New Semantic Similarity Based Model for Text Clustering Using Extended Gloss Overlaps

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Editorial: An integration of WordNet and fuzzy association rule mining for multi-label document clustering

Data & Knowledge Engineering
W-kmeans: clustering news articles using wordNet

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Detecting Intent of Web Queries Using Questions and Answers in CQA Corpus

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Athena: text mining based discovery of scientific workflows in disperse repositories

RED'10 Proceedings of the Third international conference on Resource Discovery
Wikipedia-based smoothing for enhancing text clustering

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
A clustering technique for news articles using WordNet

Knowledge-Based Systems
Semantic smoothing for text clustering

Knowledge-Based Systems
Knowledge discovery in inspection reports of marine structures

Expert Systems with Applications: An International Journal
Clustering web documents using hierarchical representation with multi-granularity

World Wide Web

Quantified Score

Hi-index	0.01

Visualization

Abstract

Text document clustering can greatly simplify browsing large collections of documents by reorganizing them into a smaller number of manageable clusters. Algorithms to solve this task exist; however, the algorithms are only as good as the data they work on. Problems include ambiguity and synonymy, the former allowing for erroneous groupings and the latter causing similarities between documents to go unnoticed. In this research, naïve, syntax-based disambiguation is attempted by assigning each word a part-of-speech tag and by enriching the 'bag-of-words' data representation often used for document clustering with synonyms and hypernyms from WordNet.