Evaluating an agglutinative segmentation model for ParaMor

Authors:
Christian Monson;Alon Lavie;Jaime Carbonell;Lori Levin
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
Year:
2008

Citing 7
Cited 0

Modeling and learning multilingual inflectional morphology in a minimally supervised framework

Modeling and learning multilingual inflectional morphology in a minimally supervised framework
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Knowledge-free induction of inflectional morphologies

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Unsupervised learning of morphology for English and Inuktitut

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
An algorithm for the unsupervised learning of morphology

Natural Language Engineering
Improving statistical MT through morphological analysis

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
ParaMor: minimally supervised induction of paradigm structure and morphological analysis

SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes and evaluates a modification to the segmentation model used in the unsupervised morphology induction system, ParaMor. Our improved segmentation model permits multiple morpheme boundaries in a single word. To prepare ParaMor to effectively apply the new agglutinative segmentation model, two heuristics improve ParaMor's precision. These precision-enhancing heuristics are adaptations of those used in other unsupervised morphology induction systems, including work by Hafer and Weiss (1974) and Goldsmith (2006). By reformulating the segmentation model used in ParaMor, we significantly improve ParaMor's performance in all language tracks and in both the linguistic evaluation as well as in the task based information retrieval (IR) evaluation of the peer operated competition Morpho Challenge 2007. ParaMor's improved morpheme recall in the linguistic evaluations of German, Finnish, and Turkish is higher than that of any system which competed in the Challenge. In the three languages of the IR evaluation, our enhanced ParaMor significantly outperforms, at average precision over newswire queries, a morphologically naïve baseline; scoring just behind the leading system from Morpho Challenge 2007 in English and ahead of the first place system in German.