Hindi-to-Urdu machine translation through transliteration

Authors:
Nadir Durrani;Hassan Sajjad;Alexander Fraser;Helmut Schmid
Affiliations:
University of Stuttgart;University of Stuttgart;University of Stuttgart;University of Stuttgart
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 12
Cited 0

Fast and Accurate Sentence Alignment of Bilingual Corpora

AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models

Computational Linguistics
Fuzzy translation of cross-lingual spelling variants

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Machine transliteration

Computational Linguistics
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A modified joint source-channel model for transliteration

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Hindi Urdu machine transliteration using finite-state transducers

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Integration of an Arabic transliteration module into a statistical machine translation system

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation. We propose two probabilistic models, based on conditional and joint probability formulations, that are novel solutions to the problem. Our models consider both transliteration and translation when translating a particular Hindi word given the context whereas in previous work transliteration is only used for translating OOV (out-of-vocabulary) words. We use transliteration as a tool for disambiguation of Hindi homonyms which can be both translated or transliterated or transliterated differently based on different contexts. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.