An ensemble of transliteration models for information retrieval

Authors:
Jong-Hoon Oh;Key-Sun Choi
Affiliations:
Comp. Sci. Div., Dept. of EECS, Korea Term. Res. Ctr. for Lang. and Knowl. Eng. (KORTERM), Korea Adv. Inst. of Sci. and Technol. (KAIST), Daejeon, Republic of Korea and Natl. Inst. of Info. and Co ...;Computer Science Division, Department of EECS, Korea Terminology Research Center for Language and Knowledge Engineering (KORTERM), Korea Advanced Institute of Science and Technology (KAIST), Daeje ...
Venue:
Information Processing and Management: an International Journal
Year:
2006

Citing 17
Cited 7

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
The nature of statistical learning theory

The nature of statistical learning theory
A maximum entropy approach to natural language processing

Computational Linguistics
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Machine Learning

Machine Learning
Editorial

Artificial Intelligence Review - Special issue on lazy learning
Induction of Decision Trees

Machine Learning
Machine transliteration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An English-Korean transliteration model using pronunciation and contextual rules

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Backward machine transliteration by learning phonetic similarity

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Maximum entropy estimation for feature forests

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Improving back-transliteration by combining information sources

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

English-Arabic proper-noun transliteration-pairs creation

Journal of the American Society for Information Science and Technology
A noisy channel model for grapheme-based machine transliteration

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Name matching between Chinese and Roman scripts: machine complements human

NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Mining Synonymous Transliterations from the World Wide Web

ACM Transactions on Asian Language Information Processing (TALIP)
Machine transliteration survey

ACM Computing Surveys (CSUR)
English to persian transliteration

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Cost-benefit analysis of two-stage conditional random fields based English-to-Chinese machine transliteration

NEWS '12 Proceedings of the 4th Named Entity Workshop

Quantified Score

Hi-index	0.01

Visualization

Abstract

Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and English-to-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%.