Transliteration using a phrase-based statistical machine translation system to re-score the output of a joint multigram model

Authors:
Andrew Finch;Eiichiro Sumita
Affiliations:
Keihanna Science City, Japan;Keihanna Science City, Japan
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 10
Cited 3

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A generic framework for machine transliteration

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Joint-sequence models for grapheme-to-phoneme conversion

Speech Communication
A deep learning approach to machine transliteration

StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Transliteration by bidirectional statistical machine translation

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Modeling machine transliteration as a phrase based statistical machine translation problem

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Whitepaper of NEWS 2010 shared task on transliteration generation

NEWS '10 Proceedings of the 2010 Named Entities Workshop

Report of NEWS 2010 transliteration generation shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
How do you pronounce your name?: improving G2P with transliterations

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Leveraging supplemental representations for sequential transduction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The system presented in this paper uses a combination of two techniques to directly transliterate from grapheme to grapheme. The technique makes no language specific assumptions, uses no dictionaries or explicit phonetic information; the process transforms sequences of tokens in the source language directly into to sequences of tokens in the target. All the language pairs in our experiments were transliterated by applying this technique in a single unified manner. The approach we take is that of hypothesis rescoring to integrate the models of two state-of-the-art techniques: phrase-based statistical machine translation (SMT), and a joint multigram model. The joint multigram model was used to generate an n-best list of transliteration hypotheses that were re-scored using the models of the phrase-based SMT system. The both of the models' scores for each hypothesis were linearly interpolated to produce a final hypothesis score that was used to re-rank the hypotheses. In our experiments on development data, the combined system was able to outperform both of its component systems substantially.