Compositional Machine Transliteration

Authors:
A. Kumaran;Mitesh M. Khapra;Pushpak Bhattacharyya
Affiliations:
Microsoft Research India;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2010

Citing 23
Cited 1

Improving cross language retrieval with triangulated translation

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
Ultraconservative online algorithms for multiclass problems

The Journal of Machine Learning Research
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
An English to Korean transliteration model of extended Markov window

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Empirical studies on the impact of lexical resources on CLIR performance

Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
The effect of named entities on effectiveness in cross-language information retrieval evaluation

Proceedings of the 2005 ACM symposium on Applied computing
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments

Journal of the American Society for Information Science and Technology
Combining probability models and web mining models: a framework for proper name transliteration

Information Technology and Management
Pivot language approach for phrase-based statistical machine translation

Machine Translation
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Hindi Urdu machine transliteration using finite-state transducers

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Improved statistical machine translation for resource-poor languages using related resource-rich languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Report of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Whitepaper of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
DirecTL: a language-independent approach to transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Named entity transcription with pair n-gram models

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Machine transliteration using target-language grapheme and phoneme: multi-engine transliteration approach

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
How do named entities contribute to retrieval effectiveness?

CLEF'04 Proceedings of the 5th conference on Cross-Language Evaluation Forum: multilingual Information Access for Text, Speech and Images

Leveraging supplemental representations for sequential transduction

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine transliteration is an important problem in an increasingly multilingual world, as it plays a critical role in many downstream applications, such as machine translation or crosslingual information retrieval systems. In this article, we propose compositional machine transliteration systems, where multiple transliteration components may be composed either to improve existing transliteration quality, or to enable transliteration functionality between languages even when no direct parallel names corpora exist between them. Specifically, we propose two distinct forms of composition: serial and parallel. Serial compositional system chains individual transliteration components, say, X → Y and Y → Z systems, to provide transliteration functionality, X → Z. In parallel composition evidence from multiple transliteration paths between X → Z are aggregated for improving the quality of a direct system. We demonstrate the functionality and performance benefits of the compositional methodology using a state-of-the-art machine transliteration framework in English and a set of Indian languages, namely, Hindi, Marathi, and Kannada. Finally, we underscore the utility and practicality of our compositional approach by showing that a CLIR system integrated with compositional transliteration systems performs consistently on par with, and sometimes better than, that integrated with a direct transliteration system.