Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Instance-Based Learning Algorithms
Machine Learning
C4.5: programs for machine learning
C4.5: programs for machine learning
The nature of statistical learning theory
The nature of statistical learning theory
A maximum entropy approach to natural language processing
Computational Linguistics
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Machine Learning
Artificial Intelligence Review - Special issue on lazy learning
Machine Learning
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
English-to-Korean transliteration using multiple unbounded overlapping phoneme chunks
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
An English-Korean transliteration model using pronunciation and contextual rules
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Translating named entities using monolingual and bilingual resources
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Backward machine transliteration by learning phonetic similarity
COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
A joint source-channel model for machine transliteration
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Maximum entropy estimation for feature forests
HLT '02 Proceedings of the second international conference on Human Language Technology Research
Improving back-transliteration by combining information sources
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
English-Arabic proper-noun transliteration-pairs creation
Journal of the American Society for Information Science and Technology
A noisy channel model for grapheme-based machine transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Name matching between Chinese and Roman scripts: machine complements human
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Mining Synonymous Transliterations from the World Wide Web
ACM Transactions on Asian Language Information Processing (TALIP)
Machine transliteration survey
ACM Computing Surveys (CSUR)
English to persian transliteration
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
NEWS '12 Proceedings of the 4th Named Entity Workshop
Hi-index | 0.01 |
Transliteration is used to phonetically translate proper names and technical terms especially from languages in Roman alphabets to languages in non-Roman alphabets such as from English to Korean, Japanese, and Chinese. Because transliterations are usually representative index terms for documents, proper handling of the transliterations is important for an effective information retrieval system. However, there are limitations on handling transliterations depending on dictionary lookup, because transliterations are usually not registered in the dictionary. For this reason, many researchers have been trying to overcome the problem using machine transliteration. In this paper, we propose a method for improving machine transliteration using an ensemble of three different transliteration models. Because one transliteration model alone has limitation on reflecting all possible transliteration behaviors, several transliteration models should be complementary used in order to achieve a high-performance machine transliteration system. This paper describes a method about transliteration production using the several machine transliteration models and transliteration ranking with web data and relevance scores given by each transliteration model. We report evaluation results for our ensemble transliteration model and experimental results for its impact on IR effectiveness. Machine transliteration tests on English-to-Korean transliteration and English-to-Japanese transliteration show that our proposed method achieves 78-80% word accuracy. Information retrieval tests on KTSET and NTCIR-1 test collection show that our transliteration model can improve the performance of an information retrieval system about 10-34%.