Algorithms for Arabic name transliteration
IBM Journal of Research and Development
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Resolving ambiguity for cross-language retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Translating names and technical terms in Arabic text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
A month to topic detection and tracking in Hindi
ACM Transactions on Asian Language Information Processing (TALIP)
ACM Transactions on Asian Language Information Processing (TALIP)
Using the web for automated translation extraction in cross-language information retrieval
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora
Journal of the American Society for Information Science and Technology - Research Articles
Weakly supervised named entity transliteration and discovery from multilingual comparable corpora
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Punjabi machine transliteration
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Named entity transliteration and discovery from multilingual comparable corpora
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
A modified joint source-channel model for transliteration
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A generic framework for machine transliteration
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A Hybrid Technique for English-Chinese Cross Language Information Retrieval
ACM Transactions on Asian Language Information Processing (TALIP)
Combining probability models and web mining models: a framework for proper name transliteration
Information Technology and Management
Data driven methods for improving mono- and cross-lingual IR performance in noisy environments
Proceedings of the second workshop on Analytics for noisy unstructured text data
English-Arabic proper-noun transliteration-pairs creation
Journal of the American Society for Information Science and Technology
Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Proceedings of the 2nd ACM workshop on Improving non english web searching
Query Translation and Expansion for Searching Normal and OCR-Degraded Arabic Text
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Capturing out-of-vocabulary words in Arabic text
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Loss-sensitive discriminative training of machine transliteration models
SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium
Integration of an Arabic transliteration module into a statistical machine translation system
StatMT '07 Proceedings of the Second Workshop on Statistical Machine Translation
Learning better transliterations
Proceedings of the 18th ACM conference on Information and knowledge management
Finding variants of out-of-vocabulary words in Arabic
Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Language independent transliteration system using phrase based SMT approach on substrings
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
ε-extension Hidden Markov Models and weighted transducers for machine transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Maximum N-gram HMM-based name transliteration: experiment in NEWS 2009 on English-Chinese corpus
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Transliteration for Resource-Scarce Languages
ACM Transactions on Asian Language Information Processing (TALIP)
Hindi-to-Urdu machine translation through transliteration
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Machine transliteration survey
ACM Computing Surveys (CSUR)
Finite-state scriptural translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Urdu and Hindi: translation and sharing of linguistic resources
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
English to persian transliteration
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Phrase-Based statistical machine translation for a low-density language pair
AI'10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence
Translation techniques in cross-language information retrieval
ACM Computing Surveys (CSUR)
Regularized interlingual projections: evaluation on multilingual transliteration
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Out of vocabulary (OOV) words are problematic for cross language information retrieval. One way to deal with OOV words when the two languages have different alphabets, is to transliterate the unknown words, that is, to render them in the orthography of the second language. In the present study, we present a simple statistical technique to train an English to Arabic transliteration model from pairs of names. We call this a selected n-gram model because a two-stage training procedure first learns which n-gram segments should be added to the unigram inventory for the source language, and then a second stage learns the translation model over this inventory. This technique requires no heuristics or linguistic knowledge of either language. We evaluate the statistically-trained model and a simpler hand-crafted model on a test set of named entities from the Arabic AFP corpus and demonstrate that they perform better than two online translation sources. We also explore the effectiveness of these systems on the TREC 2002 cross language IR task. We find that transliteration either of OOV named entities or of all OOV words is an effective approach for cross language IR.