Experiences with English-Hindi, English-Tamil and English-Kannada transliteration tasks at NEWS 2009

Authors:
Manoj Kumar Chinnakotla;Om P. Damani
Affiliations:
IIT Bombay, Mumbai, India;IIT Bombay, Mumbai, India
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 6
Cited 1

A systematic comparison of various statistical alignment models

Computational Linguistics
Cluster-specific named entity transliteration

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A generic framework for machine transliteration

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Whitepaper of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use a Phrase-Based Statistical Machine Translation approach to Transliteration where the words are replaced by characters and sentences by words. We employ the standard SMT tools like GIZA++ for learning alignments and Moses for learning the phrase tables and decoding. Besides tuning the standard SMT parameters, we focus on tuning the Character Sequence Model (CSM) related parameters like order of the CSM, weight assigned to CSM during decoding and corpus used for CSM estimation. Our results show that paying sufficient attention to CSM pays off in terms of increased transliteration accuracies.