Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Machine transliteration of names in Arabic text
SEMITIC '02 Proceedings of the ACL-02 workshop on Computational approaches to semitic languages
Transliteration of proper names in cross-lingual information retrieval
MultiNER '03 Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition - Volume 15
Online large-margin training of dependency parsers
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Flexible text segmentation with structured multilabel classification
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Discriminative methods for transliteration
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Latent-variable modeling of string transductions with finite-state methods
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Name transliteration with bidirectional perceptron edit models
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Hi-index | 0.00 |
In machine transliteration we transcribe a name across languages while maintaining its phonetic information. In this paper, we present a novel sequence transduction algorithm for the problem of machine transliteration. Our model is discriminatively trained by the MIRA algorithm, which improves the traditional Perceptron training in three ways: (1) It allows us to consider k-best transliterations instead of the best one. (2) It is trained based on the ranking of these transliterations according to user-specified loss function (Levenshtein edit distance). (3) It enables the user to tune a built-in parameter to cope with noisy non-separable data during training. On an Arabic-English name transliteration task, our model achieves a relative error reduction of 2.2% over a perceptron-based model with similar features, and an error reduction of 7.2% over a statistical machine translation model with more complex features.