Empirical study of utilizing morph-syntactic information in SMT

Authors:
Young-Sook Hwang;Taro Watanabe;Yutaka Sasaki
Affiliations:
ATR SLT Research Labs, Kyoto, Japan;ATR SLT Research Labs, Kyoto, Japan;ATR SLT Research Labs, Kyoto, Japan
Venue:
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Year:
2005

Citing 8
Cited 1

Class-based n-gram models of natural language

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Using POS information for statistical machine translation into morphologically rich languages

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Discriminative training and maximum entropy models for statistical machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Generation of word graphs in statistical machine translation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10

Improving statistical machine translation using shallow linguistic knowledge

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an empirical study that utilizes morph-syntactical information to improve translation quality. With three kinds of language pairs matched according to morph-syntactical similarity or difference, we investigate the effects of various morpho-syntactical information, such as base form, part-of-speech, and the relative positional information of a word in a statistical machine translation framework. We learn not only translation models but also word-based/class-based language models by manipulating morphological and relative positional information. And we integrate the models into a log-linear model. Experiments on multilingual translations showed that such morphological information as part-of-speech and base form are effective for improving performance in morphologically rich language pairs and that the relative positional features in a word group are useful for reordering the local word orders. Moreover, the use of a class-based n-gram language model improves performance by alleviating the data sparseness problem in a word-based language model.