Dutch word sense disambiguation: optimizing the localness of context

Authors:
Véronique Hoste;Walter Daelemans;Iris Hendrickx;Antal van den Bosch
Affiliations:
University of Antwerp, Belgium;University of Antwerp, Belgium;Tilburg University, The Netherlands;Tilburg University, The Netherlands
Venue:
WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Year:
2002

Citing 9
Cited 2

Instance-Based Learning Algorithms

Machine Learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
The interaction of knowledge sources in word sense disambiguation

Computational Linguistics
Parameter optimization for machine-learning of word sense disambiguation

Natural Language Engineering
Memory-based learning: using similarity for smoothing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Dutch word sense disambiguation: data and preliminary results

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems

A lemma-based approach to a maximum entropy word sense disambiguation system for Dutch

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
UvT-WSD1: A cross-lingual word sense disambiguation system

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a new version of the Dutch word sense disambiguation system trained and tested on a corrected version of the SENSEVAL-2 data. The system is an ensemble of word experts; each word expert is a memory-based classifier of which the parameters are automatically determined through cross-validation on training material. The original best-performing system, which used only local context features for disambiguation, is further refined by performing additional parallel cross-validation experiments for optimizing algorithmic parameters and the amount of local context available to each of the word experts' memory-based kernels. This procedure produces an accuracy of 84.8% on test material, improving on a baseline score of 77.2% and the previous SENSEVAL-2 score of 84.2%. We show that cross-validation overfits; had the local context been held constant at two left and right neighbouring words, the system would have scored 85.0%.