Transliteration for Resource-Scarce Languages

Authors:
Manoj K. Chinnakotla;Om P. Damani;Avijit Satoskar
Affiliations:
Indian Institute of Technology Bombay;Indian Institute of Technology Bombay;Indian Institute of Technology Bombay
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2010

Citing 27
Cited 0

Algorithms for Arabic name transliteration

IBM Journal of Research and Development
A systematic comparison of various statistical alignment models

Computational Linguistics
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
A machine transliteration model based on correspondence between graphemes and phonemes

ACM Transactions on Asian Language Information Processing (TALIP)
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Named entity transliteration with comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Punjabi machine transliteration

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A hybrid back-transliteration system for Japanese

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Direct orthographical mapping for machine transliteration

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Cluster-specific named entity transliteration

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
A modified joint source-channel model for transliteration

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A generic framework for machine transliteration

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Shahmukhi to Gurmukhi transliteration system

COLING '08 22nd International Conference on on Computational Linguistics: Demonstration Papers
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Discriminative methods for transliteration

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Whitepaper of NEWS 2009 machine transliteration shared task

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Phoneme-Based transliteration of foreign names for OOV problem

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
An ensemble of grapheme and phoneme for machine transliteration

IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, parallel corpus-based systems dominate the transliteration landscape. But the resource-scarce languages do not enjoy the luxury of large parallel transliteration corpus. For these languages, rule-based transliteration is the only viable option. In this article, we show that by properly harnessing the monolingual resources in conjunction with manually created rule base, one can achieve reasonable transliteration performance. We achieve this performance by exploiting the power of Character Sequence Modeling (CSM), which requires only monolingual resources. We present the results of our rule-based system for Hindi to English, English to Hindi, and Persian to English transliteration tasks. We also perform extrinsic evaluation of transliteration systems in the context of Cross Lingual Information Retrieval. Another important contribution of our work is to explain the widely varying accuracy numbers reported in transliteration literature, in terms of the entropy of the language pairs and the datasets involved.