Case markers and morphology: addressing the crux of the fluency problem in English-Hindi SMT

Authors:
Ananthakrishnan Ramanathan;Hansraj Choudhary;Avishek Ghosh;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India;Indian Institute of Technology Bombay, Powai, Mumbai, India
Venue:
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Year:
2009

Citing 5
Cited 7

Applied morphological processing of English

Natural Language Engineering
Minimum error rate training in statistical machine translation

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Statistical Machine Translation with Scarce Resources Using Morpho-syntactic Information

Computational Linguistics
Statistical machine translation by parsing

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Clause restructuring for statistical machine translation

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics

Using TectoMT as a preprocessing tool for phrase-based statistical machine translation

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Combining morpheme-based machine translation with post-processing morpheme prediction

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
An exponential translation model for target language morphology

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A word reordering model for improved machine translation

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Probes in a taxonomy of factored phrase-based models

WMT '12 Proceedings of the Seventh Workshop on Statistical Machine Translation
No free lunch in factored phrase-based machine translation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2
Rule-based approach for handling of case markers in English to Urdu/Hindi translation

International Journal of Knowledge Engineering and Soft Data Paradigms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report in this paper our work on accurately generating case markers and suffixes in English-to-Hindi SMT. Hindi is a relatively free word-order language, and makes use of a comparatively richer set of case markers and morphological suffixes for correct meaning representation. From our experience of large-scale English-Hindi MT, we are convinced that fluency and fidelity in the Hindi output get an order of magnitude facelift if accurate case markers and suffixes are produced. Now, the moot question is: what entity on the English side encodes the information contained in case markers and suffixes on the Hindi side? Our studies of correspondences in the two languages show that case markers and suffixes in Hindi are predominantly determined by the combination of suffixes and semantic relations on the English side. We, therefore, augment the aligned corpus of the two languages, with the correspondence of English suffixes and semantic relations with Hindi suffixes and case markers. Our results on 400 test sentences, translated using an SMT system trained on around 13000 parallel sentences, show that suffix + semantic relation → case marker/suffix is a very useful translation factor, in the sense of making a significant difference to output quality as indicated by subjective evaluation as well as BLEU scores.