Transliteration mining with phonetic conflation and iterative training

Authors:
Kareem Darwish
Affiliations:
Cairo Microsoft Innovation Center, Cairo, Egypt
Venue:
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Year:
2010

Citing 11
Cited 7

Extracting named entity translingual equivalence with limited resources

ACM Transactions on Asian Language Information Processing (TALIP)
Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Learning transliteration lexicons from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Named entity transliteration and discovery from multilingual comparable corpora

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Mining the Web for Transliteration Lexicons: Joint-Validation Approach

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
"They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
MINT: a method for effective and scalable mining of named entity transliterations from large comparable corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Using word dependent transition models in HMM based word alignment for statistical machine translation

StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Automated mining of names using parallel Hindi-English corpus

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Transliteration equivalence using canonical correlation analysis

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Report of NEWS 2010 transliteration mining shared task

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Is a query worth translating: ask the users!

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
An algorithm for unsupervised transliteration mining with an application to word alignment

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Improved transliteration mining using graph reinforcement

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transliteration mining using large training and test sets

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
A statistical model for unsupervised and semi-supervised transliteration mining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A Bayesian Alignment Approach to Transliteration Mining

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents transliteration mining on the ACL 2010 NEWS workshop shared transliteration mining task data. Transliteration mining was done using a generative transliteration model applied on the source language and whose output was constrained on the words in the target language. A total of 30 runs were performed on 5 language pairs, with 6 runs for each language pair. In the presence of limited resources, the runs explored the use of phonetic conflation and iterative training of the transliteration model to improve recall. Using letter conflation improved recall by as much as 48%, with improvements in recall dwarfing drops in precision. Using iterative training improved recall, but often at the cost of significant drops in precision. The best runs typically used both letter conflation and iterative learning.