An estimate of an upper bound for the entropy of English
Computational Linguistics
Mostly-unsupervised statistical segmentation of Japanese Kanji sequences
Natural Language Engineering
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hierarchical Bayesian language model based on Pitman-Yor processes
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Bayesian semi-supervised Chinese word segmentation for statistical machine translation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Topic models with power-law using Pitman-Yor process
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Subword variation in text message classification
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Variational inference for adaptor grammars
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A Bayesian method for robust estimation of distributional similarities
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
An efficient algorithm for unsupervised word segmentation with branching entropy and MDL
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised phonemic Chinese word segmentation using adaptor grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonparametric word segmentation for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
A method to measure the reading difficulty of Japanese words
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
A new unsupervised approach to word segmentation
Computational Linguistics
Dialect translation: integrating Bayesian co-segmentation models with pivot-based SMT
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Non-parametric bayesian segmentation of Japanese noun phrases
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Method to build a bilingual lexicon for speech-to-speech translation systems
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Hierarchical Bayesian language modelling for the linguistically informed
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
A hierarchical dirichlet process model for joint part-of-speech and morphology induction
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A probabilistic model for canonicalizing named entity mentions
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Using rejuvenation to improve particle filtering for Bayesian word segmentation
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Unsupervized word segmentation: the case for Mandarin Chinese
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
A regularized compression method to unsupervised word segmentation
SIGMORPHON '12 Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology
Iterative annotation transformation with predict-self reestimation for Chinese word segmentation
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Drive video summarization based on double articulation structure of driving behavior
Proceedings of the 20th ACM international conference on Multimedia
A Bayesian Alignment Approach to Transliteration Mining
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our model is a nested hierarchical Pitman-Yor language model, where Pitman-Yor spelling model is embedded in the word model. We confirmed that it significantly outperforms previous reported results in both phonetic transcripts and standard datasets for Chinese and Japanese word segmentation. Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any "word" indications.