Automated mining of names using parallel Hindi-English corpus

Authors:
R. Mahesh K. Sinha
Affiliations:
Indian Institute of Technology, Kanpur, India
Venue:
ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Year:
2009

Citing 8
Cited 3

Algorithms for Arabic name transliteration

IBM Journal of Research and Development
A systematic comparison of various statistical alignment models

Computational Linguistics
Machine transliteration

Computational Linguistics
Extracting named entity translingual equivalence with limited resources

ACM Transactions on Asian Language Information Processing (TALIP)
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Learning transliteration lexicons from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A modified joint source-channel model for transliteration

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions

Transliteration mining with phonetic conflation and iterative training

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Improved transliteration mining using graph reinforcement

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transliteration mining using large training and test sets

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Machine transliteration has a number of applications in a variety of natural language processing related tasks such as machine translation, information retrieval and question-answering. For automated learning of machine transliteration, a large parallel corpus of names in two scripts is required. In this paper we present a simple yet powerful method for automatic mining of Hindi-English names from a parallel corpus. An average 93% precision and 85% recall is achieved in mining of proper names. The method works even with a small corpus. We compare our results with Giza++ word alignment tool that yields 30% precision and 63% recall on the same corpora. We also demonstrate that this very method of name mining works for other Indian languages as well.