An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Unsupervised language acquisition
Unsupervised language acquisition
Greek word segmentation using minimal information
HLT-SRWS '04 Proceedings of the Student Research Workshop at HLT-NAACL 2004
Applying collocation segmentation to the ACL anthology reference corpus
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Hi-index | 0.00 |
Several computational simulations of how children solve the word segmentation problem have been proposed, but most have been applied only to a limited number of languages. One model with some experimental support uses distributional statistics of sound sequence predictability (Saffran et al. 1996). However, the experimental design does not fully specify how predictability is best measured or modeled in a simulation. Saffran et al. (1996) assume transitional probability, but Brent (1999a) claims mutual information (MI) is more appropriate. Both assume predictability is measured locally, relative to neighboring segment-pairs. This paper replicates Brent's (1999a) mutual-information model on a corpus of childdirected speech in Modern Greek, and introduces a variant model using a global threshold. Brent's finding regarding the superiority of MI is confirmed; the relative performance of local comparisons and global thresholds depends on the evaluation metric.