Whitepaper of NEWS 2010 shared task on transliteration mining

Authors:
A. Kumaran;Mitesh M. Khapra;Haizhou Li
Affiliations:
Microsoft Research India, Bangalore, India;Indian Institute of Technology-Bombay, Mumbai, India;Institute for Infocomm Research, Singapore
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 0
Cited 4

Report of NEWS 2010 transliteration mining shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A statistical model for unsupervised and semi-supervised transliteration mining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A Bayesian Alignment Approach to Transliteration Mining

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Transliteration is generally defined as phonetic translation of names across languages. Machine Transliteration is a critical technology in many domains, such as machine translation, cross-language information retrieval/extraction, etc. Recent research has shown that high quality machine transliteration systems may be developed in a language-neutral manner, using a reasonably sized good quality corpus (~15--25K parallel names) between a given pair of languages. In this shared task, we focus on acquisition of such good quality names corpora in many languages, thus complementing the machine transliteration shared task that is concurrently conducted in the same NEWS 2010 workshop. Specifically, this task focuses on mining the Wikipedia paired entities data (aka, inter-wiki-links) to produce high-quality transliteration data that may be used for transliteration tasks.