Smart paradigms and the predictability and complexity of inflectional morphology

Authors:
Grégoire Détrez;Aarne Ranta
Affiliations:
Chalmers University of Technology and University of Gothenburg;Chalmers University of Technology and University of Gothenburg
Venue:
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Year:
2012

Citing 8
Cited 0

Regular models of phonological rule systems

Computational Linguistics - Special issue on computational phonology
Morphological Guesser of Czech Words

TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
An algorithm for the unsupervised learning of morphology

Natural Language Engineering
Standards going concrete: from LMF to Morphalou

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Learning probabilistic paradigms for morphology in a latent class model

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Grammatical Framework: Programming with Multilingual Grammars

Grammatical Framework: Programming with Multilingual Grammars
Morphological lexicon extraction from raw text data

FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
Discovering morphological paradigms from plain text using a Dirichlet process mixture model

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Morphological lexica are often implemented on top of morphological paradigms, corresponding to different ways of building the full inflection table of a word. Computationally precise lexica may use hundreds of paradigms, and it can be hard for a lexicographer to choose among them. To automate this task, this paper introduces the notion of a smart paradigm. It is a meta-paradigm, which inspects the base form and tries to infer which low-level paradigm applies. If the result is uncertain, more forms are given for discrimination. The number of forms needed in average is a measure of predictability of an inflection system. The overall complexity of the system also has to take into account the code size of the paradigms definition itself. This paper evaluates the smart paradigms implemented in the open-source GF Resource Grammar Library. Predictability and complexity are estimated for four different languages: English, French, Swedish, and Finnish. The main result is that predictability does not decrease when the complexity of morphology grows, which means that smart paradigms provide an efficient tool for the manual construction and/or automatically bootstrapping of lexica.