Evaluation of clustering algorithms for word sense disambiguation

Authors:
Bartosz Broda;Wojciech Mazur
Affiliations:
Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland.;Institute of Informatics, Wroclaw University of Technology, 50-370 Wroclaw, Poland
Venue:
International Journal of Data Analysis Techniques and Strategies
Year:
2012

Citing 24
Cited 0

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining

INFORMS Journal on Computing
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Clustering by committee

Clustering by committee
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Building a sense tagged corpus with open mind word expert

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Introduction to Information Retrieval

Introduction to Information Retrieval
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Inter-coder agreement for computational linguistics

Computational Linguistics
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Semeval-2007 task 02: evaluating word sense induction and discrimination systems

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
SemEval-2010 task 14: evaluation setting for word sense induction & disambiguation systems

DEW '09 Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions
Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
The WSD development environment

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
SENSEVAL-2: overview

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word sense disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly preparation of training data. This work focuses on evaluation of a few selected clustering algorithms in the task of word sense disambiguation. We used five datasets for two languages (English and Polish). Five clustering algorithms (k-means, k-medoids, hierarchical agglomerative clustering, hierarchical divisive clustering, graph-partitioning-based clustering) and two weighting schemes were tested. The best parameters of the algorithms were chosen using 5 × 2 cross validation. BCubed measure was employed for evaluation of clustering. We conclude that with these settings agglomerative hierarchical clustering achieves best results for all the datasets.