W-kmeans: clustering news articles using wordNet

  • Authors:
  • Christos Bouras;Vassilis Tsogkas

  • Affiliations:
  • Computer Engineering and Informatics Department, University of Patras, Greece and Research Academic Computer Technology Institute, Patras, Greece;Computer Engineering and Informatics Department, University of Patras, Greece

  • Venue:
  • KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed suffering however from problems like synonymy, ambiguity and lack of a descriptive content marking of the generated clusters. We are proposing the enhancement of standard kmeans algorithm using the external knowledge from WordNet hypernyms in a twofold manner: enriching the "bag of words" used prior to the clustering process and assisting the label generation procedure following it. Our experimentation revealed a significant improvement over standard kmeans for a corpus of news articles derived from major news portals. Moreover, the cluster labeling process generates useful and of high quality cluster tags.