Improving statistical machine translation using shallow linguistic knowledge

Authors:
Young-Sook Hwang;Andrew Finch;Yutaka Sasaki
Affiliations:
ETRI Speech and Language Information Research Center, 161 Yuseong-gu, Daejeon 305-700, Republic of Korea;ATR SLC Research Labs 2-2-2 Hikaridai Seika-cho Soraku-gun Kyoto, 619-0288, Japan;ATR SLC Research Labs 2-2-2 Hikaridai Seika-cho Soraku-gun Kyoto, 619-0288, Japan
Venue:
Computer Speech and Language
Year:
2007

Citing 14
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
Using POS information for statistical machine translation into morphologically rich languages

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Loosely tree-based alignment for machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
A phrase-based, joint probability model for statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Generation of word graphs in statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Empirical study of utilizing morph-syntactic information in SMT

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Combining morphosyntactic enriched representation with n-best reranking in statistical translation

SSST '07 Proceedings of the NAACL-HLT 2007/AMTA Workshop on Syntax and Structure in Statistical Translation
Incorporating Linguistic Information to Statistical Word-Level Alignment

CIARP '09 Proceedings of the 14th Iberoamerican Conference on Pattern Recognition: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe methods for improving the performance of statistical machine translation (SMT) between four linguistically different languages, i.e., Chinese, English, Japanese, and Korean by using morphosyntactic knowledge. For the purpose of reducing the translation ambiguities and generating grammatically correct and fluent translation output, we address the use of shallow linguistic knowledge, that is: (1) enriching a word with its morphosyntactic features, (2) obtaining shallow linguistically-motivated phrase pairs, (3) iteratively refining word alignment using filtered phrase pairs, and (4) building a language model from morphosyntactically enriched words. Previous studies reported that the introduction of syntactic features into SMT models resulted in only a slight improvement in performance in spite of the heavy computational expense, however, this study demonstrates the effectiveness of morphosyntactic features, when reliable, discriminative features are used. Our experimental results show that word representations that incorporate morphosyntactic features significantly improve the performance of the translation model and language model. Moreover, we show that refining the word alignment using fine-grained phrase pairs is effective in improving system performance.