The WSD development environment

Authors:
Rafał Młodzki;Adam Przepiórkowski
Affiliations:
Institute of Computer Science PAS, ul. Ordona 21, Warszawa, Poland;Institute of Computer Science PAS, ul. Ordona 21, Warszawa, Poland and University of Warsaw, Krakowskie Przedmieście, Warszawa, Poland
Venue:
LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Year:
2009

Citing 4
Cited 3

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
A flexemic tagset for Polish

MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
Towards the adequate evaluation of morphosyntactic taggers

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Which XML standards for multilevel corpus annotation?

LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
Mining class association rules for word sense disambiguation

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Evaluation of clustering algorithms for word sense disambiguation

International Journal of Data Analysis Techniques and Strategies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm using various knowledge sources. Experiments were conducted on a small manually sense-tagged corpus of 13 Polish words. The usual groups of features were implemented including bag-of-words, parts-of-speech, words with their positions, etc. (with different settings), in connection with popular ML algorithms (including Naive Bayes, Decision Trees and Support Vector Machines). The aim was to test to what extent standard approaches to the English WSD task may be adopted to free word order and rich inflection languages such as Polish. In accordance with earlier results in the literature, the initial experiments suggest that these standard approaches are relatively well-suited for Polish. On the other hand, contrary to earlier findings, the experiments also show that adding of some features beyond bag-of-words increases the average accuracy of the results.