An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Nonparametric bayesian models of lexical acquisition
Nonparametric bayesian models of lexical acquisition
The SED heuristic for morpheme discovery: a look at Swahili
PMHLA '05 Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition
Improving word segmentation by simultaneously learning phonotactics
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Unsupervised phonemic Chinese word segmentation using adaptor grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Inducing Tree-Substitution Grammars
The Journal of Machine Learning Research
Modeling syntactic context improves morphological segmentation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
CMCL '11 Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics
Hi-index | 0.00 |
This paper describes a variety of non-parametric Bayesian models of word segmentation based on Adaptor Grammars that model different aspects of the input and incorporate different kinds of prior knowledge, and applies them to the Bantu language Sesotho. While we find overall word segmentation accuracies lower than these models achieve on English, we also find some interesting differences in which factors contribute to better word segmentation. Specifically, we found little improvement to word segmentation accuracy when we modeled contextual dependencies, while modeling morphological structure did improve segmentation accuracy.