English to Malayalam translation: a statistical approach

Authors:
Mary Priya Sebastian;Sheena Kurian K;G. Santhosh Kumar
Affiliations:
Cochin University of Science and Technology, Kerala, India;Cochin University of Science and Technology, Kerala, India;Cochin University of Science and Technology, Cochin, Kerala, India
Venue:
Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India
Year:
2010

Citing 3
Cited 0

A statistical approach to machine translation

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Evaluating evaluation methods for generation in the presence of variation

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper underlines a methodology for translating text from English into the Dravidian language, Malayalam using statistical models. By using a monolingual Malayalam corpus and a bilingual English/Malayalam corpus in the training phase, the machine automatically generates Malayalam translations of English sentences. This paper also discusses a technique to improve the alignment model by incorporating the parts of speech information into the bilingual corpus. Removing the insignificant alignments from the sentence pairs by this approach has ensured better training results. Pre-processing techniques like suffix separation from the Malayalam corpus and stop word elimination from the bilingual corpus also proved to be effective in training. Various handcrafted rules designed for the suffix separation process which can be used as a guideline in implementing suffix separation in Malayalam language are also presented in this paper. The structural difference between the English Malayalam pair is resolved in the decoder by applying the order conversion rules. Experiments conducted on a sample corpus have generated reasonably good Malayalam translations and the results are verified with F measure, BLEU and WER evaluation metrics.