Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Theoretical Computer Science - Implementation and application of automata
Multilingual modeling of cross-lingual spelling variants
Information Retrieval
Overview of Morpho challenge 2008
CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Guessers for Finite-State Transducer Lexicons
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
A nearest-neighbor approach to the automatic analysis of ancient Greek morphology
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Hi-index | 0.00 |
Language software applications encounter new words, e.g., acronyms, technical terminology, loan words, names or compounds of such words. Looking at English, one might assume that they appear in base form, i.e., the lexical look-up form. However, in more highly inflecting languages like Finnish or Swahili only 40-50 % of new words appear in base form. In order to index documents or discover translations for these languages, it would be useful to reduce new words to their base forms as well. We often have access to analyzes for more frequent words which shape our intuition for how new words will inflect. We formalize this into a probabilistic model for lemmatization of new words using analogy, i.e., guessing base forms, and test the model on English, Finnish, Swedish and Swahili demonstrating that we get a recall of 89- 99 % with an average precision of 76-94 % depending on language and the amount of training material.