The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
A generative constituent-context model for improved grammar induction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Computational Linguistics
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Contextual dependencies in unsupervised word segmentation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hybrid Markov/semi-Markov conditional random field for sequence segmentation
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Bilingually Motivated Word Segmentation for Statistical Machine Translation
ACM Transactions on Asian Language Information Processing (TALIP)
Query segmentation based on eigenspace similarity
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Language independent word segmentation for statistical machine translation
Proceedings of the 3rd International Universal Communication Symposium
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Pseudo-word for phrase-based machine translation
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
WMT '10 Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
Semi-supervised learning of concatenative morphology
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Nonparametric word segmentation for machine translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Joint tokenization and translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Word segmentation for dialect translation
CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Bayesian word alignment for statistical machine translation
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Dialect translation: integrating Bayesian co-segmentation models with pivot-based SMT
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Non-parametric bayesian segmentation of Japanese noun phrases
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Enhancing Chinese word segmentation using unlabeled data
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An empirical study on word segmentation for chinese machine translation
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Hi-index | 0.00 |
Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation model which uses both monolingual and bilingual information to derive a segmentation suitable for MT. Experiments show that our method improves a state-of-the-art MT system in a small and a large data environment.