The WSD development environment

  • Authors:
  • Rafał Młodzki;Adam Przepiórkowski

  • Affiliations:
  • Institute of Computer Science PAS, ul. Ordona 21, Warszawa, Poland;Institute of Computer Science PAS, ul. Ordona 21, Warszawa, Poland and University of Warsaw, Krakowskie Przedmieście, Warszawa, Poland

  • Venue:
  • LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm using various knowledge sources. Experiments were conducted on a small manually sense-tagged corpus of 13 Polish words. The usual groups of features were implemented including bag-of-words, parts-of-speech, words with their positions, etc. (with different settings), in connection with popular ML algorithms (including Naive Bayes, Decision Trees and Support Vector Machines). The aim was to test to what extent standard approaches to the English WSD task may be adopted to free word order and rich inflection languages such as Polish. In accordance with earlier results in the literature, the initial experiments suggest that these standard approaches are relatively well-suited for Polish. On the other hand, contrary to earlier findings, the experiments also show that adding of some features beyond bag-of-words increases the average accuracy of the results.