A Bayesian model for morpheme and paradigm identification

Authors:
Matthew G. Snover;Michael R. Brent
Affiliations:
Washington University, St. Louis, MO;Washington University, St. Louis, MO
Venue:
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Year:
2001

Citing 4
Cited 16

Unsupervised learning of the morphology of a natural language

Computational Linguistics
Memory-based morphological analysis

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7

Modelling highly inflected languages

Information Sciences—Informatics and Computer Science: An International Journal
Unsupervised segmentation of words using prior distributions of morph length and frequency

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Bootstrapping a multilingual part-of-speech tagger in one person-day

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Unsupervised models for morpheme segmentation and morphology learning

ACM Transactions on Speech and Language Processing (TSLP)
An unsupervised Hindi stemmer with heuristic improvements

Proceedings of the second workshop on Analytics for noisy unstructured text data
Induction of a simple morphology for highly-inflecting languages

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Richness of the base and probabilistic unsupervised learning in optimality theory

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Morphology induction from limited noisy data using approximate string matching

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
A naive theory of affixation and an algorithm for extraction

SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
Inducing Morphemes Using Light Knowledge

ACM Transactions on Asian Language Information Processing (TALIP)
Enhanced word decomposition by calibrating the decision threshold of probabilistic models and using a model ensemble

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Investigating the Relationship Between Linguistic Representation and Computation through an Unsupervised Model of Human Morphology Learning

Research on Language and Computation
The study of effect of length in morphological segmentation of agglutinative languages

MM '12 Proceedings of the First Workshop on Multilingual Modeling

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system for unsupervised learning of morphological affixes from texts or word lists. The system is composed of a generative probability model and a search algorithm. Experiments on the Wall Street Journal and the Hansard Corpus (French and English) demonstrate the effectiveness of this approach. The results suggest that more integrated systems for learning both affixes and morphographemic adjustment rules may be feasible. In addition, several definitions and a theorem are developed so that our search algorithm can be formalized in terms of the lattice formed by subsets of suffixes under inclusion. This formalism is expected to be useful for investigating alternative search strategies over the same morphological hypothesis space.