A lemma-based approach to a maximum entropy word sense disambiguation system for Dutch

Authors:
Tanja Gaustad
Affiliations:
University of Groningen, AS Groningen, The Netherlands
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 6
Cited 1

A maximum entropy approach to natural language processing

Computational Linguistics
Parameter optimization for machine-learning of word sense disambiguation

Natural Language Engineering
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Dutch word sense disambiguation: optimizing the localness of context

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Dutch word sense disambiguation: data and preliminary results

SENSEVAL '01 The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems
Finite state tools for natural language processing

Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems

Multilingual word sense discrimination: a comparative cross-linguistic study

ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a corpus-based supervised word sense disambiguation (WSD) system for Dutch which combines statistical classification (maximum entropy) with linguistic information. Instead of building individual classifiers per ambiguous wordfomi, we introduce a lemma-based approach. The advantage of this novel method is that it clusters all inflected forms of an ambiguous word in one classifier, therefore augmenting the training material available to the algorithm. Testing the lemmabased model on the Dutch Senseval-2 test data, we achieve a significant increase in accuracy over the wordform model. Also, the WSD system based on lemmas is smaller and more robust.