Designing HMM-Based Part-of-Speech Tagger for Lithuanian Language

Authors:
Giedrė Pajarskaitė;Vilma Griciūtė;Gailius Raškinis;Jan Kuper
Affiliations:
Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Center of Computational Linguistics, Vytautas Magnus University, Donelaičio 52, 3000 Kaunas, Lithuania, e-mail: pajgie@lycos.com, gvilma@lycos.com, idgara@vdu.lt;Faculty of Computer Science, University of Twente, P.O.Box 217 7500 AE Enschede, the Netherlands, e-mail: jankuper@cs.utwente.nl
Venue:
Informatica
Year:
2004

Citing 5
Cited 0

Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Statistical methods for speech recognition

Statistical methods for speech recognition
Automatic Ambiguity Resolution in Natural Language Processing: An Empirical Approach

Automatic Ambiguity Resolution in Natural Language Processing: An Empirical Approach
Statistical Language Learning

Statistical Language Learning
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a preliminary experiment in designing a Hidden Markov Model (HMM)-based part-of-speech tagger for the Lithuanian language. Part-of-speech tagging is the problem of assigning to each word of a text the proper tag in its context of appearance. It is accomplished in two basic steps: morphological analysis and disambiguation. In this paper, we focus on the problem of disambiguation, i.e., on the problem of choosing the correct tag for each word in the context of a set of possible tags. We constructed a stochastic disambiguation algorithm, based on supervised learning techniques, to learn hidden Markov model's parameters from hand-annotated corpora. The Viterbi algorithm is used to assign the most probable tag to each word in the text.