An Efficient, Probabilistically Sound Algorithm for Segmentation andWord Discovery
Machine Learning - Special issue on natural language learning
Inducing Probabilistic Grammars by Bayesian Model Merging
ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Monte Carlo Statistical Methods (Springer Texts in Statistics)
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Improving word segmentation by simultaneously learning phonotactics
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Learning words and their meanings from unsegmented child-directed speech
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Subword variation in text message classification
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Painless unsupervised learning with features
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Blocked inference in Bayesian tree substitution grammars
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Recession segmentation: simpler online word segmentation using limited resources
CoNLL '10 Proceedings of the Fourteenth Conference on Computational Natural Language Learning
Modeling perspective using adaptor grammars
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
An efficient algorithm for unsupervised word segmentation with branching entropy and MDL
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised phonemic Chinese word segmentation using adaptor grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Inducing Tree-Substitution Grammars
The Journal of Machine Learning Research
Insertion operator for Bayesian tree substitution grammars
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Nonparametric Bayesian machine transliteration with synchronous adaptor grammars
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Modeling infant word segmentation
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
The Journal of Machine Learning Research
Incorporating lexical priors into topic models
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Learning semantics and selectional preference of adjective-noun pairs
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Bayesian symbol-refined tree substitution grammars for syntactic parsing
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploiting social information in grounded language learning via grammatical reductions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Using rejuvenation to improve particle filtering for Bayesian word segmentation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Unsupervised morphology rivals supervised morphology for Arabic MT
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A regularized compression method to unsupervised word segmentation
SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Pronunciation extraction from phoneme sequences through cross-lingual word-to-phoneme alignment
SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
Statistical parsing with probabilistic symbol-refined tree substitution grammars
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87% word token f-score on the standard Brent version of the Bernstein-Ratner corpus, which is an error reduction of over 35% over the best previously reported results for this corpus.