Learning phonetic similarity for matching named entity translations and mining new translations

Authors:
Wai Lam;Ruizhang Huang;Pik-Shan Cheung
Affiliations:
The Chinese University of Hong Kong, Shatin, Hong Kong;The Chinese University of Hong Kong, Shatin, Hong Kong;The Chinese University of Hong Kong, Shatin, Hong Kong
Venue:
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2004

Citing 7
Cited 8

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Exponentiated gradient versus gradient descent for linear predictors

Information and Computation
Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Translation of web queries using anchor text mining

ACM Transactions on Asian Language Information Processing (TALIP)
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20

Named entity translation matching and learning: With application for mining unseen translations

ACM Transactions on Information Systems (TOIS)
Extracting loanwords from Mongolian corpora and producing a Japanese-Mongolian bilingual dictionary

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Learning transliteration lexicons from the web

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A high-accurate Chinese-English NE backward translation system combining both lexical information and web statistics

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A phonetic similarity model for automatic extraction of transliteration pairs

ACM Transactions on Asian Language Information Processing (TALIP)
Active learning for constructing transliteration lexicons from the Web

Journal of the American Society for Information Science and Technology
Entity matching across heterogeneous data sources: An approach based on constrained cascade generalization

Data & Knowledge Engineering
Machine transliteration survey

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel named entity matching model which considers both semantic and phonetic clues. The matching is formulated as an optimization problem. One major component is a phonetic matching model which exploits similarity at the phoneme level. We investigate three learning algorithms for obtaining the similarity information of basic phoneme units based on training examples. By applying this proposed named entity matching model, we also develop a mining framework for discovering new, unseen named entity translations from online daily Web news. This framework harvests comparable news in different languages using an existing bilingual dictionary. It is able to discover new name translations not found in the dictionary.