Morphological tagging: data vs. dictionaries
NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
MULTEXT: Multilingual Text Tools and Corpora
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Building the Croatian morphological lexicon
MorphSlav '03 Proceedings of the 2003 EACL Workshop on Morphological Processing of Slavic Languages
TermeX: A Tool for Collocation Extraction
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
OWL/DL formalization of the MULTEXT-East morphosyntactic specifications
LAW V '11 Proceedings of the 5th Linguistic Annotation Workshop
Persian in MULTEXT-East framework
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Hi-index | 0.00 |
Word-level morphosyntactic descriptions, such as "Ncmsn" designating a common masculine singular noun in the nominative, have been developed for all Slavic languages, yet there have been few attempts to arrive at a proposal that would be harmonised across the languages. Standardisation adds to the interchange potential of the resources, making it easier to develop multilingual applications or to evaluate language technology tools across several languages. The process of the harmonisation of morphosyntactic categories, esp. for morphologically rich Slavic languages is also interesting from a language-typological perspective. The EU Multext-East project developed corpora, lexica and tools for seven languages, with the focus being on morphosyntactic data, including formal, EAGLES-based specifications for lexical morphosyntactic descriptions. The specifications were later extended, so that they currently cover nine languages, five from the Slavic family: Bulgarian, Croatian, Czech, Serbian and Slovene. The paper presents these morphosyntactic specifications, giving their background and structure, including the encoding of the tables as TEI feature structures. The five Slavic language specifications are discussed in more depth.