Extracting named entity translingual equivalence with limited resources

Authors:
Fei Huang;Stephan Vogel;Alex Waibel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
ACM Transactions on Asian Language Information Processing (TALIP)
Year:
2003

Citing 5
Cited 5

The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic extraction of named entity translingual equivalence based on multi-feature cost minimization

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15

Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
Automated mining of names using parallel Hindi-English corpus

ALR7 Proceedings of the 7th Workshop on Asian Language Resources
Transliteration mining with phonetic conflation and iterative training

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Improved transliteration mining using graph reinforcement

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Transliteration mining using large training and test sets

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article we present an automatic approach to extracting Hindi-English (H-E) Named Entity (NE) translingual equivalences from bilingual parallel corpora. In the absence of a Hindi NE tagger or H-E translation dictionary, this approach adapts a Chinese-English (C-E) surface string transliteration model for H-E NE extraction. The model is initially trained using automatically extracted C-E NE pairs, then iteratively updated based on newly extracted H-E NE pairs. For each English person and location NE in each sentence pair, this approach searches for its Hindi correspondence with minimum transliteration cost and constructs an H-E NE list from the bilingual corpus. Experiments show that this approach extracted 1000 H-E NE pairs with a precision of 91.8%.