Word alignment in English-Hindi parallel corpus using recency-vector approach: some studies

Authors:
Niladri Chatterjee;Saumya Agrawal
Affiliations:
Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India;Indian Institute of Technology, Kharagpur, West Bengal, India
Venue:
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Year:
2006

Citing 4
Cited 0

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Combining clues for word alignment

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Chinese-Korean word alignment based on linguistic comparison

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word alignment using recency-vector based approach has recently become popular. One major advantage of these techniques is that unlike other approaches they perform well even if the size of the parallel corpora is small. This makes these algorithms worth-studying for languages where resources are scarce. In this work we studied the performance of two very popular recency-vector based approaches, proposed in (Fung and McKeown, 1994) and (Somers, 1998), respectively, for word alignment in English-Hindi parallel corpus. But performance of the above algorithms was not found to be satisfactory. However, subsequent addition of some new constraints improved the performance of the recency-vector based alignment technique significantly for the said corpus. The present paper discusses the new version of the algorithm and its performance in detail.