Communications of the ACM
Extended models and tools for high-performance part-of-speech tagger
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Guessing parts-of-speech of unknown words using global information
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Japanese unknown word identification by character-based chunking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Chinese segmentation and new word detection using conditional random fields
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Training conditional random fields using incomplete annotations
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Bayesian semi-supervised Chinese word segmentation for statistical machine translation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Online acquisition of Japanese unknown morphemes using morphological constraints
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A note on the implementation of hierarchical dirichlet processes
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Punctuation as implicit annotations for chinese word segmentation
Computational Linguistics
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Simple type-level unsupervised POS tagging
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised phonemic Chinese word segmentation using adaptor grammars
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Nonparametric word segmentation for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Hi-index | 0.00 |
A key factor of high quality word segmentation for Japanese is a high-coverage dictionary, but it is costly to manually build such a lexical resource. Although external lexical resources for human readers are potentially good knowledge sources, they have not been utilized due to differences in segmentation criteria. To supplement a morphological dictionary with these resources, we propose a new task of Japanese noun phrase segmentation. We apply non-parametric Bayesian language models to segment each noun phrase in these resources according to the statistical behavior of its supposed constituents in text. For inference, we propose a novel block sampling procedure named hybrid type-based sampling, which has the ability to directly escape a local optimum that is not too distant from the global optimum. Experiments show that the proposed method efficiently corrects the initial segmentation given by a morphological analyzer.