An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
A statistical model for word discovery in transcribed speech
Computational Linguistics
Weighted rational transductions and their application to human language processing
HLT '94 Proceedings of the workshop on Human Language Technology
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Prototype-driven learning for sequence models
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Scalable training of L1-regularized log-linear models
Proceedings of the 24th international conference on Machine learning
Unsupervised learning of acoustic sub-word units
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Modelling early language acquisition skills: towards a general statistical learning mechanism
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Latent-variable modeling of string transductions with finite-state methods
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
OpenFst: a general and efficient weighted finite-state transducer library
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Viterbi training improves unsupervised dependency parsing
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Unsupervised Pattern Discovery in Speech
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation---for instance "the" might be realized as [ði] or [ðə]. Previous models of acquisition have generally tackled these problems in isolation, yet behavioral evidence suggests infants acquire lexical and phonetic knowledge simultaneously. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. The model is trained on transcribed surface pronunciations, and learns by bootstrapping, without access to the true lexicon. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries. In both cases modeling variability improves the accuracy of the learned lexicon over a system that assumes each lexical item has a unique pronunciation.