Large Margin Classification Using the Perceptron Algorithm
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
Training products of experts by minimizing contrastive divergence
Neural Computation
Unsupervised learning of the morphology of a natural language
Computational Linguistics
Nonconcatenative finite-state morphology
EACL '87 Proceedings of the third conference on European chapter of the Association for Computational Linguistics
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Knowledge-free induction of inflectional morphologies
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Minimally supervised morphological analysis by multimodal alignment
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Context-based morphological disambiguation with random fields
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Morphemes as necessary concept for structures discovery from untagged corpora
NeMLaP3/CoNLL '98 Proceedings of the Joint Conferences on New Methods in Language Processing and Computational Natural Language Learning
Latent-variable modeling of string transductions with finite-state methods
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Learning probabilistic paradigms for morphology in a latent class model
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
ParaMor: minimally supervised induction of paradigm structure and morphological analysis
SigMorPhon '07 Proceedings of Ninth Meeting of the ACL Special Interest Group in Computational Morphology and Phonology
Improving morphology induction by learning spelling rules
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Graphical models over multiple strings
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Morpho Challenge competition 2005--2010: evaluations and results
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
A non-parametric model for the discovery of inflectional paradigms from plain text using graphical models over strings
Smart paradigms and the predictability and complexity of inflectional morphology
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Probabilistic hierarchical clustering of morphological paradigms
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
A hierarchical dirichlet process model for joint part-of-speech and morphology induction
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Name phylogeny: a generative model of string variation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.01 |
We present an inference algorithm that organizes observed words (tokens) into structured inflectional paradigms (types). It also naturally predicts the spelling of unobserved forms that are missing from these paradigms, and discovers inflectional principles (grammar) that generalize to wholly unobserved words. Our Bayesian generative model of the data explicitly represents tokens, types, inflections, paradigms, and locally conditioned string edits. It assumes that inflected word tokens are generated from an infinite mixture of inflectional paradigms (string tuples). Each paradigm is sampled all at once from a graphical model, whose potential functions are weighted finite-state transducers with language-specific parameters to be learned. These assumptions naturally lead to an elegant empirical Bayes inference procedure that exploits Monte Carlo EM, belief propagation, and dynamic programming. Given 50--100 seed paradigms, adding a 10-million-word corpus reduces prediction error for morphological inflections by up to 10%.