Improving machine transliteration performance by using multiple transliteration models

Authors:
Jong-Hoon Oh;Key-Sun Choi;Hitoshi Isahara
Affiliations:
Computational Linguistics Group, National Institute of Information and Communications Technology (NICT), Kyoto, Japan;Computer Science Division, EECS, KAIST, Daejeon, Republic of Korea;Computational Linguistics Group, National Institute of Information and Communications Technology (NICT), Kyoto, Japan
Venue:
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Year:
2006

Citing 13
Cited 2

A maximum entropy approach to natural language processing

Computational Linguistics
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Mining the Web to Create a Language Model for Mapping between English Names and Phrases and Japanese

WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Exploiting the Web as the multilingual corpus for unknown query translation

Journal of the American Society for Information Science and Technology
Machine Learning Based English-to-Korean Transliteration Using Grapheme and Phoneme Information

IEICE - Transactions on Information and Systems
Finding ideographic representations of Japanese names written in Latin script via language identification and corpus validation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Improving back-transliteration by combining information sources

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
An ensemble of grapheme and phoneme for machine transliteration

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Web-Based Transliteration of Person Names

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Machine transliteration survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine transliteration has received significant attention as a supporting tool for machine translation and cross-language information retrieval. During the last decade, four kinds of transliteration model have been studied — grapheme-based model, phoneme-based model, hybrid model, and correspondence-based model. These models are classified in terms of the information sources for transliteration or the units to be transliterated — source graphemes, source phonemes, both source graphemes and source phonemes, and the correspondence between source graphemes and phonemes, respectively. Although each transliteration model has shown relatively good performance, one model alone has limitations on handling complex transliteration behaviors. To address the problem, we combined different transliteration models with a “generating transliterations followed by their validation” strategy. The strategy makes it possible to consider complex transliteration behaviors using the strengths of each model and to improve transliteration performance by validating transliterations. Our method makes use of web-based and transliteration model-based validation for transliteration validation. Experiments showed that our method outperforms both the individual transliteration models and previous work.