Parameter optimization for machine-learning of word sense disambiguation

  • Authors:
  • V. Hoste;I. Hendrickx;W. Daelemans;A. Van Den Bosch

  • Affiliations:
  • CNTS Language Technology Group, University of Antwerp, Belgium e-mail: hoste@uia.ua.ac.be, daelem@uia.ua.ac.be;ILK Computational Linguistics, Tilburg University, The Netherlands e-mail: I.H.E.Hendrickx@kub.nl, Antal.vdnBosch@kub.nl;CNTS Language Technology Group, University of Antwerp, Belgium e-mail: hoste@uia.ua.ac.be, daelem@uia.ua.ac.be and ILK Computational Linguistics, Tilburg University, The Netherlands;ILK Computational Linguistics, Tilburg University, The Netherlands e-mail: I.H.E.Hendrickx@kub.nl, Antal.vdnBosch@kub.nl and WhizBang! Labs – Research, Pittsburgh, PA, USA

  • Venue:
  • Natural Language Engineering
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Various Machine Learning (ML) approaches have been demonstrated to produce relatively successful Word Sense Disambiguation (WSD) systems. There are still unexplained differences among the performance measurements of different algorithms, hence it is warranted to deepen the investigation into which algorithm has the right ‘bias’ for this task. In this paper, we show that this is not easy to accomplish, due to intricate interactions between information sources, parameter settings, and properties of the training data. We investigate the impact of parameter optimization on generalization accuracy in a memory-based learning approach to English and Dutch WSD. A ‘word-expert’ architecture was adopted, yielding a set of classifiers, each specialized in one single wordform. The experts consist of multiple memory-based learning classifiers, each taking different information sources as input, combined in a voting scheme. We optimized the architectural and parametric settings for each individual word-expert by performing cross-validation experiments on the learning material. The results of these experiments show that the variation of both the algorithmic parameters and the information sources available to the classifiers leads to large fluctuations in accuracy. We demonstrate that optimization per word-expert leads to an overall significant improvement in the generalization accuracies of the produced WSD systems.