Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Unsupervised language acquisition
Unsupervised language acquisition
Building a large annotated corpus of English: the penn treebank
Computational Linguistics - Special issue on using large corpora: II
Unsupervised learning of the morphology of a natural language
Computational Linguistics
A discovery procedure for certain phonological rules
ACL '84 Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics
Unsupervised discovery of morphemes
MPL '02 Proceedings of the ACL-02 workshop on Morphological and phonological learning - Volume 6
Learning probabilistic paradigms for morphology in a latent class model
SIGPHON '06 Proceedings of the Eighth Meeting of the ACL Special Interest Group on Computational Phonology and Morphology
A Bayesian model of natural language phonology: generating alternations from underlying forms
SigMorPhon '08 Proceedings of the Tenth Meeting of ACL Special Interest Group on Computational Morphology and Phonology
Improving morphology induction by learning spelling rules
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Hi-index | 0.01 |
This paper describes a Bayesian procedure for unsupervised learning of phonological rules from an unlabeled corpus of training data. Like Goldsmith's Linguistica program (Goldsmith, 2004b), whose output is taken as the starting point of this procedure, our learner returns a grammar that consists of a set of signatures, each of which consists of a set of stems and a set of suffixes. Our grammars differ from Linguistica's in that they also contain a set of phonological rules, specifically insertion, deletion and substitution rules, which permit our grammars to collapse far more words into a signature than Linguistica can. Interestingly, the choice of Bayesian prior turns out to be crucial for obtaining a learner that makes linguistically appropriate generalizations through a range of different sized training corpora.