A vector space model for automatic indexing
Communications of the ACM
Semantic similarity methods in wordNet and their application to information retrieval on the web
Proceedings of the 7th annual ACM international workshop on Web information and data management
Automatically labeling hierarchical clusters
dg.o '06 Proceedings of the 2006 international conference on Digital government research
Parallel bisecting k-means with prediction clustering algorithm
The Journal of Supercomputing
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
PeRSSonal's core functionality evaluation: Enhancing text labeling through personalized summaries
Data & Knowledge Engineering
The Evaluation Measure of Text Clustering for the Variable Number of Clusters
ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Part II--Advances in Neural Networks
Improving Text Summarization Using Noun Retrieval Techniques
KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
An Integration of Fuzzy Association Rules and WordNet for Document Clustering
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Enhancing cluster labeling using wikipedia
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
WordNet-based text document clustering
ROMAND '04 Proceedings of the 3rd Workshop on RObust Methods in Analysis of Natural Language Data
Generic title labeling for clustered documents
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the "bag of words" used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.