Unsupervised morpheme discovery with ungrade

Authors:
Bruno Golénia;Sebastian Spiegler;Peter A. Flach
Affiliations:
Computer Science Department, University of Bristol, UK;Computer Science Department, University of Bristol, UK;Computer Science Department, University of Bristol, UK
Venue:
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Year:
2009

Citing 8
Cited 0

Dynamic itemset counting and implication rules for market basket data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
ParaMor: Finding Paradigms across Morphology

Advances in Multilingual and Multimodal Information Retrieval
A Bayesian Information Criterion Based Approach for Model Complexity Selection in Speaker Identification

ALPIT '08 Proceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology
Morphemes as necessary concept for structures discovery from untagged corpora

NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Overview of Morpho challenge 2008

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Towards Learning Morphology for Under-Resourced Fusional and Agglutinating Languages

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step prefix and suffix sequences are sought using a morpheme graph. The last step consists in combining morphemes found in the previous steps. UNGRADE has been experimentally evaluated on 5 languages (English, German, Finnish, Turkish and Arabic) with encouraging results.