Everybody loves a rich cousin: an empirical study of transliteration through bridge languages

Authors:
Mitesh M. Khapra;A. Kumaran;Pushpak Bhattacharyya
Affiliations:
Indian Institute of Technology, Bombay, Powai, Mumbai, India;Microsoft Research India, Bangalore, India;Indian Institute of Technology, Bombay, Powai, Mumbai, India
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 12
Cited 2

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Pivot language approach for phrase-based statistical machine translation

Machine Translation
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Whitepaper of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
How do named entities contribute to retrieval effectiveness?

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Machine transliteration: leveraging on third languages

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Leveraging supplemental representations for sequential transduction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most state of the art approaches for machine transliteration are data driven and require significant parallel names corpora between languages. As a result, developing transliteration functionality among n languages could be a resource intensive task requiring parallel names corpora in the order of nC2. In this paper, we explore ways of reducing this high resource requirement by leveraging the available parallel data between subsets of the n languages, transitively. We propose, and show empirically, that reasonable quality transliteration engines may be developed between two languages, X and Y, even when no direct parallel names data exists between them, but only transitively through language Z. Such systems alleviate the need for O(nC2) corpora, significantly. In addition we show that the performance of such transitive transliteration systems is in par with direct transliteration systems, in practical applications, such as CLIR systems.