A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Statistical transliteration for english-arabic cross language information retrieval
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Improving IBM word-alignment model 1
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A maximum entropy word aligner for Arabic-English machine translation
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Word alignment via quadratic assignment
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Hindi Urdu machine transliteration using finite-state transducers
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
A hybrid model for Urdu Hindi transliteration
NEWS '09 Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration
Hi-index | 0.00 |
Hindi and Urdu share a common phonology, morphology and grammar but are written in different scripts. In addition, the vocabularies have also diverged significantly especially in the written form. In this paper we show that we can get reasonable quality translations (we estimated the Translation Error rate at 18%) between the two languages even in absence of a parallel corpus. Linguistic resources such as treebanks, part of speech tagged data and parallel corpora with English are limited for both these languages. We use the translation system to share linguistic resources between the two languages. We demonstrate improvements on three tasks and show: statistical machine translation from Urdu to English is improved (0.8 in BLEU score) by using a Hindi-English parallel corpus, Hindi part of speech tagging is improved (upto 6% absolute) by using an Urdu part of speech corpus and a Hindi-English word aligner is improved by using a manually word aligned Urdu-English corpus (upto 9% absolute in F-Measure).