Application of Different Learning Methods to Hungarian Part-of-Speech Tagging

  • Authors:
  • Tamás Horváth;Zoltán Alexin;Tibor Gyimothy;Stefan Wrobel

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

From the point of view of computational linguistics, Hungarian is a diffcult language due to its complex grammar and rich morphology. This means that even a common task such as part-of-speech tagging presents a new challenge for learning when looked at for the Hungarian language, especially given the fact that this language has fairly free word order. In this paper we therefore present a case study designed to illustrate the potential and limits of current ILP and non-ILP algorithms on the Hungarian POS-tagging task. We have selected the popular C4.5 and Progol systems as propositional and ILP representatives, adding experiments with our own methods AGLEARN, a C4.5 preprocessor based on attribute grammars, and the ILP approaches PHM and RIBL. The systems were compared on the Hungarian version of the multilingual morphosyntactically annotated MULTEXT-East TELRI corpus which consists of about 100.000 tokens. Experimental results indicate that Hungarian POS-tagging is indeed a challenging task for learning algorithms, that even simple background knowledge leads to large differences in accuracy, and that instance-based methods are promising approaches to POS tagging also for Hungarian. The paper also includes experiments with some different cascade connections of the taggers.