A data mining approach to learn reorder rules for SMT

Authors:
P. V. S. Avinesh
Affiliations:
IIIT Hyderabad, Language Technologies Research Centre
Venue:
HLT-SRWS '10 Proceedings of the NAACL HLT 2010 Student Research Workshop
Year:
2010

Citing 11
Cited 2

A statistical approach to machine translation

Computational Linguistics
Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
The theory of parsing, translation, and compiling

The theory of parsing, translation, and compiling
Mining Generalized Association Rules

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A systematic comparison of various statistical alignment models

Computational Linguistics
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An implementation of the FP-growth algorithm

Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations
A hierarchical phrase-based model for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Improving a statistical MT system with automatically learned rewrite patterns

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Statistical machine reordering

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

ILLC-UvA translation system for EMNLP-WMT 2011

WMT '11 Proceedings of the Sixth Workshop on Statistical Machine Translation
Statistical translation after source reordering: Oracles, context-aware models, and empirical analysis

Natural Language Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a syntax based source side reordering method for phrase-based statistical machine translation (SMT) systems. The source side training corpus is first parsed, then reordering rules are automatically learnt from source-side phrases and word alignments. Later the source side training and test corpus are reordered and given to the SMT system. Reordering is a common problem observed in language pairs of distant language origins. This paper describes an automated approach for learning reorder rules from a word-aligned parallel corpus using association rule mining. Reordered and generalized rules are the most significant in our approach. Our experiments were conducted on an English-Hindi EILMT corpus.