A kernel regression framework for SMT

Authors:
Zhuoran Wang;John Shawe-Taylor
Affiliations:
Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK WC1E 6BT;Department of Computer Science, Centre for Computational Statistics and Machine Learning, University College London, London, UK WC1E 6BT
Venue:
Machine Translation
Year:
2010

Citing 8
Cited 0

The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
wEBMT: developing and validating an example-based machine translation system using the world wide web

Computational Linguistics - Special issue on web as corpus
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Base Noun Phrase translation using web data and the EM algorithm

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A general regression technique for learning transductions

ICML '05 Proceedings of the 22nd international conference on Machine learning
Improving Machine Translation Performance by Exploiting Non-Parallel Corpora

Computational Linguistics
Further meta-evaluation of machine translation

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.