Finding the optimal number of clusters for word sense disambiguation

Authors:
Bartosz Broda;Paweł Kędzia
Affiliations:
Institute of Informatics, Wrocław University of Technology, Poland;Institute of Informatics, Wrocław University of Technology, Poland
Venue:
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Year:
2011

Citing 5
Cited 0

Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Automatic cluster stopping with criterion functions and the gap statistic

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ambiguity is an inherent problem for many tasks in Natural Language Processing. Unsupervised and semi-supervised approaches to ambiguity resolution are appealing as they lower the cost of manual labour. Typically, those methods struggle with estimation of number of senses without supervision. This paper shows research on using stopping functions applied to clustering algorithms for estimation of number of senses. The experiments were performed for Polish and English. We found that estimation based on PK2 stopping functions is encouraging, but only when using coarse-grained distinctions between senses.