Combining probability models and web mining models: a framework for proper name transliteration

Authors:
Yilu Zhou;Feng Huang;Hsinchun Chen
Affiliations:
Department of Information Systems and Management, George Washington University, Washington, USA 20052;Consumer Electronic Group, Handheld Division, Advanced Micro Devices, Inc., Sunnyvale, USA;Department of Management Information Systems, The University of Arizona, Tucson, USA
Venue:
Information Technology and Management
Year:
2008

Citing 13
Cited 3

Algorithms for Arabic name transliteration

IBM Journal of Research and Development
Cross-language information access to multilingual collections on the internet

Journal of the American Society for Information Science - digital libraries: Part 1
Statistical transliteration for english-arabic cross language information retrieval

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic English-Chinese name transliteration for development of multilingual resources

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Anchor text mining for translation of Web queries: A transitive translation approach

ACM Transactions on Information Systems (TOIS)
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Machine transliteration of names in Arabic text

SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Transliteration of proper names in cross-lingual information retrieval

MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Web mining in soft computing framework: relevance, state of the art and future directions

IEEE Transactions on Neural Networks

ε-extension Hidden Markov Models and weighted transducers for machine transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Maximum N-gram HMM-based name transliteration: experiment in NEWS 2009 on English-Chinese corpus

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Compositional Machine Transliteration

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid growth of the Internet has created a tremendous number of multilingual resources. However, language boundaries prevent information sharing and discovery across countries. Proper names play an important role in search queries and knowledge discovery. When foreign names are involved, proper names are often translated phonetically which is referred to as transliteration. In this research we propose a generic transliteration framework, which incorporates an enhanced Hidden Markov Model (HMM) and a Web mining model. We improved the traditional statistical-based transliteration in three areas: (1) incorporated a simple phonetic transliteration knowledge base; (2) incorporated a bigram and a trigram HMM; (3) incorporated a Web mining model that uses word frequency of occurrence information from the Web. We evaluated the framework on an English---Arabic back transliteration. Experiments showed that when using HMM alone, a combination of the bigram and trigram HMM approach performed the best for English---Arabic transliteration. While the bigram model alone achieved fairly good performance, the trigram model alone did not. The Web mining approach boosted the performance by 79.05%. Overall, our framework achieved a precision of 0.72 when the eight best transliterations were considered. Our results show promise for using transliteration techniques to improve multilingual Web retrieval.