Unsupervised models for morpheme segmentation and morphology learning

Authors:
Mathias Creutz;Krista Lagus
Affiliations:
Helsinki University of Technology, Finland;Helsinki University of Technology, Finland
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2007

Citing 25
Cited 33

Guessing morphology from terms and corpora

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Inference of variable-length linguistic and acoustic units by multigrams

Speech Communication
An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery

Machine Learning - Special issue on natural language learning
Discovering Chinese words from unsegmented text (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Stochastic Complexity in Statistical Inquiry Theory

Stochastic Complexity in Statistical Inquiry Theory
Self-Supervised Chinese Word Segmentation

IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Unsupervised language acquisition

Unsupervised language acquisition
Unsupervised learning of the morphology of a natural language

Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese: applications to kanji

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
A Bayesian model for morpheme and paradigm identification

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of inflectional morphologies

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Unsupervised learning of morphology for English and Inuktitut

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Unsupervised segmentation of words using prior distributions of morph length and frequency

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Minimally supervised morphological analysis by multimodal alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of morphology using latent semantic analysis

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Unsupervised learning of morphology using a novel directed search algorithm: taking the first step

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised learning of morphology without morphemes

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Learning case-based knowledge for disambiguating Chinese word segmentation: a preliminary study

SIGHAN '02 Proceedings of the first SIGHAN workshop on Chinese language processing - Volume 18
Induction of a simple morphology for highly-inflecting languages

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Multilingual noise-robust supervised morphological analysis using the WordFrame model

SIGMorPhon '04 Proceedings of the 7th Meeting of the ACL Special Interest Group in Computational Phonology: Current Themes in Computational Phonology and Morphology
Using morphology and syntax together in unsupervised learning

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition
The SED heuristic for morpheme discovery: a look at Swahili

PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition

Morph-based speech recognition and modeling of out-of-vocabulary words across languages

ACM Transactions on Speech and Language Processing (TSLP)
Chinese Word Segmentation for Terrorism-Related Contents

PAISI, PACCF and SOCO '08 Proceedings of the IEEE ISI 2008 PAISI, PACCF, and SOCO international workshops on Intelligence and Security Informatics
Cross-lingual propagation for morphological analysis

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Unsupervised morphological segmentation with log-linear models

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Semi-supervised lexicon mining from parenthetical expressions in monolingual web pages

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Unsupervised morphological segmentation and clustering with document boundaries

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Allomorfessor: towards unsupervised morpheme analysis

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Wordica: Emergence of linguistic representations for words by independent component analysis

Natural Language Engineering
A statistical model for lost language decipherment

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised search for the optimal segmentation for statistical machine translation

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Applying morphological decomposition to statistical machine translation

WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Semi-supervised learning of concatenative morphology

SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Predicting the semantic compositionality of prefix verbs

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Enhancing morphological alignment for translating highly inflected languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonparametric word segmentation for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Unsupervised morpheme analysis with allomorfessor

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Selected operations and applications of n-tape weighted finite-state machines

FSMNLP'09 Proceedings of the 8th international conference on Finite-state methods and natural language processing
Domain-specific Chinese word segmentation using suffix tree and mutual information

Information Systems Frontiers
Unsupervised bilingual morpheme segmentation and alignment with context-rich hidden semi-Markov models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Modeling syntactic context improves morphological segmentation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Predicting reaction times in word recognition by unsupervised learning of morphology

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Learning a taxonomy from a set of text documents

Applied Soft Computing
Poor man’s stemming: unsupervised recognition of same-stem words

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Universal morphological analysis using structured nearest neighbor prediction

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimal stem identification in presence of suffix list

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Predictive text entry for agglutinative languages using unsupervised morphological segmentation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
A comparative investigation of morphological language modeling for the languages of the European union

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Arabic retrieval revisited: morphological hole filling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Unsupervised morphology rivals supervised morphology for Arabic MT

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Kriya - The SFU system for translation task at WMT-12

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
Extrinsic evaluation on automatic summarization tasks: testing affixality measurements for statistical word stemming

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Computational Intelligence - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a model family called Morfessor for the unsupervised induction of a simple morphology from raw text data. The model is formulated in a probabilistic maximum a posteriori framework. Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes. A lexicon of word segments, called morphs, is induced from the data. The lexicon stores information about both the usage and form of the morphs. Several instances of the model are evaluated quantitatively in a morpheme segmentation task on different sized sets of Finnish as well as English data. Morfessor is shown to perform very well compared to a widely known benchmark algorithm, in particular on Finnish data.