Guessers for Finite-State Transducer Lexicons

Authors:
Krister Lindén
Affiliations:
Department of General Linguistics, University of Helsinki, FIN-00014
Venue:
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 9
Cited 1

Modeling and learning multilingual inflectional morphology in a minimally supervised framework

Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Automatic rule induction for unknown-word guessing

Computational Linguistics
Bootstrapping morphological analyzers by combining human elicitation and machine learning

Computational Linguistics
Unsupervised learning of word-category guessing rules

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Morph-based speech recognition and modeling of out-of-vocabulary words across languages

ACM Transactions on Speech and Language Processing (TSLP)
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Bootstrapping deep lexical resources: resources for courses

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
An analogical learner for morphological analysis

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
A probabilistic model for guessing base forms of new words by analogy

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing

Part-of-speech tagging using parallel weighted finite-state transducers

IceTAL'10 Proceedings of the 7th international conference on Advances in natural language processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language software applications encounter new words, e.g., acronyms, technical terminology, names or compounds of such words. In order to add new words to a lexicon, we need to indicate their inflectional paradigm. We present a new generally applicable method for creating an entry generator, i.e. a paradigm guesser, for finite-state transducer lexicons. As a guesser tends to produce numerous suggestions, it is important that the correct suggestions be among the first few candidates. We prove some formal properties of the method and evaluate it on Finnish, English and Swedish full-scale transducer lexicons. We use the open-source Helsinki Finite-State Technology [1] to create finite-state transducer lexicons from existing lexical resources and automatically derive guessers for unknown words. The method has a recall of 82-87 % and a precision of 71-76 % for the three test languages. The model needs no external corpus and can therefore serve as a baseline.