Morphological annotation of the Lithuanian corpus

Authors:
Vidas Daudaravičius;Erika Rimkutė;Andrius Utka
Affiliations:
Vytautas Magnus University, Kaunas, Lithuania;Vytautas Magnus University, Kaunas, Lithuania;Vytautas Magnus University, Kaunas, Lithuania
Venue:
ACL '07 Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies
Year:
2007

Citing 2
Cited 0

Serial combination of rules and statistics: a case study in Czech tagging

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)

Quantified Score

Hi-index	0.01

Visualization

Abstract

As the development of information technologies makes progress, large morphologically annotated corpora become a necessity, as they are necessary for moving onto higher levels of language computerisation (e. g. automatic syntactic and semantic analysis, information extraction, machine translation). Research of morphological disambiguation and morphological annotation of the 100 million word Lithuanian corpus are presented in the article. Statistical methods have enabled to develop the automatic tool of morphological annotation for Lithuanian, with the disambiguation precision of 94%. Statistical data about the distribution of parts of speech, most frequent wordforms, and lemmas, in the annotated Corpus of The Contemporary Lithuanian Language is also presented.