Designing HMM-Based Part-of-Speech Tagger for Lithuanian Language

  • Authors:
  • Giedrė Pajarskaitė;Vilma Griciūtė;Gailius Raškinis;Jan Kuper

  • Affiliations:
  • Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Faculty of Computer Science, University of Twente, P.O.Box 217 7500 AE Enschede, the Netherlands, e-mail: jankuper@cs.utwente.nl

  • Venue:
  • Informatica
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a preliminary experiment in designing a Hidden Markov Model (HMM)-based part-of-speech tagger for the Lithuanian language. Part-of-speech tagging is the problem of assigning to each word of a text the proper tag in its context of appearance. It is accomplished in two basic steps: morphological analysis and disambiguation. In this paper, we focus on the problem of disambiguation, i.e., on the problem of choosing the correct tag for each word in the context of a set of possible tags. We constructed a stochastic disambiguation algorithm, based on supervised learning techniques, to learn hidden Markov model's parameters from hand-annotated corpora. The Viterbi algorithm is used to assign the most probable tag to each word in the text.