Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Efficient unsupervised recursive word segmentation using minimum description length
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Unsupervised segmentation of Chinese text by use of branching entropy
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
An efficient algorithm for unsupervised word segmentation with branching entropy and MDL
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Fully unsupervised word segmentation with BVE and MDL
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Entropy as an indicator of context boundaries: an experiment using a web search engine
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Paper: Modeling by shortest data description
Automatica (Journal of IFAC)
Hi-index | 0.00 |
Languages are constantly evolving through their users due to the need to communicate more efficiently. Under this hypothesis, we formulate unsupervised word segmentation as a regularized compression process. We reduce this process to an optimization problem, and propose a greedy inclusion solution. Preliminary test results on the Bernstein-Ratner corpus and Bakeoff-2005 show that the our method is comparable to the state-of-the-art in terms of effectiveness and efficiency.