A hybrid model for Urdu Hindi transliteration

Authors:
Abbas Malik;Laurent Besacier;Christian Boitet;Pushpak Bhattacharyya
Affiliations:
Université Joseph Fourier;Université Joseph Fourier;Université Joseph Fourier;IIT Bombay
Venue:
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Year:
2009

Citing 7
Cited 3

Fuzzy translation of cross-lingual spelling variants

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic transliteration for Japanese-to-English text retrieval

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Transliteration of proper names in cross-language applications

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Machine transliteration

Computational Linguistics
Hindi Urdu machine transliteration using finite-state transducers

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Translating names and technical terms in Arabic text

Semitic '98 Proceedings of the Workshop on Computational Approaches to Semitic Languages
Letter-to-sound conversion for Urdu text-to-speech system

Semitic '04 Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages

Finite-state scriptural translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Urdu and Hindi: translation and sharing of linguistic resources

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Optimized and hygienic touch screen keyboard for large letter set languages

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report in this paper a novel hybrid approach for Urdu to Hindi transliteration that combines finite-state machine (FSM) based techniques with statistical word language model based approach. The output from the FSM is filtered with the word language model to produce the correct Hindi output. The main problem handled is the case of omission of diacritical marks from the input Urdu text. Our system produces the correct Hindi output even when the crucial information in the form of diacritic marks is absent. The approach improves the accuracy of the transducer-only approach from 50.7% to 79.1%. The results reported show that performance can be improved using a word language model to disambiguate the output produced by the transducer-only approach, especially when diacritic marks are not present in the Urdu input.