A systematic comparison of various statistical alignment models
Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Minimum error rate training in statistical machine translation
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Chinese word segmentation as LMR tagging
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
HHMM-based Chinese lexical analyzer ICTCLAS
SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Tree-to-string alignment template for statistical machine translation
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Scalable inference and training of context-rich syntactic translation models
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hierarchical Phrase-Based Translation
Computational Linguistics
Bayesian semi-supervised Chinese word segmentation for statistical machine translation
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Forest-based translation rule extraction
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using a maximum entropy model to build segmentation lattices for MT
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Improved statistical machine translation by multiple Chinese word segmentation
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Optimizing Chinese word segmentation for machine translation performance
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
Unsupervised tokenization for machine translation
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Statistical machine translation into a morphologically complex language
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Word alignment combination over multiple word segmentation
HLT-SS '11 Proceedings of the ACL 2011 Student Session
Joint models for Chinese POS tagging and dependency parsing
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Enhancing statistical machine translation with character alignment
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
Joint Optimization for Chinese POS Tagging and Dependency Parsing
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.01 |
As tokenization is usually ambiguous for many natural languages such as Chinese and Korean, tokenization errors might potentially introduce translation mistakes for translation systems that rely on 1-best to-kenizations. While using lattices to offer more alternatives to translation systems have elegantly alleviated this problem, we take a further step to tokenize and translate jointly. Taking a sequence of atomic units that can be combined to form words in different ways as input, our joint decoder produces a tokenization on the source side and a translation on the target side simultaneously. By integrating tokenization and translation features in a discriminative framework, our joint decoder outperforms the baseline translation systems using 1-best tokenizations and lattices significantly on both Chinese-English and Korean-Chinese tasks. Interestingly, as a tokenizer, our joint decoder achieves significant improvements over monolingual Chinese tokenizers.