Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble

Authors:
Sebastian Spiegler;Peter A. Flach
Affiliations:
University of Bristol, U.K.;University of Bristol, U.K.
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 14
Cited 0

Learning the past tense of English verbs using inductive logic programming

Connectionist, Statistical, and Symbolic Approaches to Learning for Natural Language Processing
Analogical Prediction

ILP '99 Proceedings of the 9th International Workshop on Inductive Logic Programming
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Bootstrapping morphological analyzers by combining human elicitation and machine learning

Computational Linguistics
Multilingual text analysis for text-to-speech synthesis

Natural Language Engineering
Modularity in a connectionist model of morphology acquisition

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Morphology in Machine Translation Systems: Efficient Integration of Finite State Transducers and Feature Structure Descriptions

Machine Translation
A Bayesian model for morpheme and paradigm identification

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
Overview and results of Morpho challenge 2009

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Unsupervised word decomposition with the promodes algorithm

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Paramor: from paradigm structure to natural language morphology induction

Paramor: from paradigm structure to natural language morphology induction
Towards Learning Morphology for Under-Resourced Fusional and Agglutinating Languages

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper demonstrates that the use of ensemble methods and carefully calibrating the decision threshold can significantly improve the performance of machine learning methods for morphological word decomposition. We employ two algorithms which come from a family of generative probabilistic models. The models consider segment boundaries as hidden variables and include probabilities for letter transitions within segments. The advantage of this model family is that it can learn from small datasets and easily generalises to larger datasets. The first algorithm Promodes, which participated in the Morpho Challenge 2009 (an international competition for unsupervised morphological analysis) employs a lower order model whereas the second algorithm Promodes-H is a novel development of the first using a higher order model. We present the mathematical description for both algorithms, conduct experiments on the morphologically rich language Zulu and compare characteristics of both algorithms based on the experimental results.