Fuzzy translation of cross-lingual spelling variants
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic transliteration for Japanese-to-English text retrieval
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Transliteration of proper names in cross-language applications
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Computational Linguistics
Hindi Urdu machine transliteration using finite-state transducers
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Translating names and technical terms in Arabic text
Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Letter-to-sound conversion for Urdu text-to-speech system
Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages
Finite-state scriptural translation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Urdu and Hindi: translation and sharing of linguistic resources
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Optimized and hygienic touch screen keyboard for large letter set languages
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.00 |
We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.