Evaluation of clustering algorithms for word sense disambiguation

  • Authors:
  • Bartosz Broda;Wojciech Mazur

  • Affiliations:
  • Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland.;Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland

  • Venue:
  • International Journal of Data Analysis Techniques and Strategies
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Word sense disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly preparation of training data. This work focuses on evaluation of a few selected clustering algorithms in the task of word sense disambiguation. We used five datasets for two languages (English and Polish). Five clustering algorithms (k-means, k-medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, graph-partitioning-based clustering) and two weighting schemes were tested. The best parameters of the algorithms were chosen using 5 × 2 cross validation. BCubed measure was employed for evaluation of clustering. We conclude that with these settings agglomerative hierarchical clustering achieves best results for all the datasets.