An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
A procedure for unsupervised lexicon learning
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Algorithm for Segmenting Categorical Time Series into Meaningful Episodes
IDA '01 Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis
Unsupervised learning of morphology for English and Inuktitut
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Nonparametric bayesian models of lexical acquisition
Nonparametric bayesian models of lexical acquisition
Voting experts: An unsupervised algorithm for segmenting sequences
Intelligent Data Analysis
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
From phoneme to morpheme: another verification using a corpus
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Hi-index | 0.00 |
During language acquisition, children learn to segment speech into phonemes, syllables, morphemes, and words. We examine word segmentation specifically, and explore the possibility that children might have general-purpose chunking mechanisms to perform word segmentation. The Voting Experts (VE) and Bootstrapped Voting Experts (BVE) algorithms serve as computational models of this chunking ability. VE finds chunks by searching for a particular information-theoretic signature: low internal entropy and high boundary entropy. BVE adds to VE the ability to incorporate information about word boundaries previously found by the algorithm into future segmentations. We evaluate the general chunking model on phonemically-encoded corpora of child-directed speech, and show that it is consistent with empirical results in the developmental literature. We argue that it offers a parsimonious alternative to special-purpose linguistic models.