A systematic comparison of various statistical alignment models
Computational Linguistics
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
TnT: a statistical part-of-speech tagger
ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A localized prediction model for statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
N-gram-based Machine Translation
Computational Linguistics
Novel reordering approaches in phrase-based statistical machine translation
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Hi-index | 0.00 |
Statistical machine translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are assumed to be capable of learning reorderings. However, the difference in word order between two languages is one of the most important sources of errors in SMT. In this paper, we show that SMT can take advantage of inductive learning in order to solve reordering problems. Given a word alignment, we identify those pairs of consecutive source blocks (sequences of words) whose translation is swapped, i.e. those blocks which, if swapped, generate a correct monotonic translation. Afterwards, we classify these pairs into groups, following recursively a co-occurrence block criterion, in order to infer reorderings. Inside the same group, we allow new internal combination in order to generalize the reorder to unseen pairs of blocks. Then, we identify the pairs of blocks in the source corpora (both training and test) which belong to the same group. We swap them and we use the modified source training corpora to realign and to build the final translation system. We have evaluated our reordering approach both in alignment and translation quality. In addition, we have used two state-of-the-art SMT systems: a Phrased-based and an Ngram-based. Experiments are reported on the EuroParl task, showing improvements almost over 1 point in the standard MT evaluation metrics (mWER and BLEU).