Improved algorithm for automatic word alignment for hindi-punjabi parallel corpus

Authors:
Karuna Jindal;Vishal Goyal
Affiliations:
Department of Computer Science, Punjabi University, Patiala, India;Department of Computer Science, Punjabi University, Patiala, India
Venue:
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Year:
2010

Citing 5
Cited 0

The Ultimate VB.NET and ASP.NET Code Book

The Ultimate VB.NET and ASP.NET Code Book
Beginning Asp.Net in Vb.Net: From Novice to Professional

Beginning Asp.Net in Vb.Net: From Novice to Professional
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A hybrid approach to align sentences and words in English-Hindi parallel corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Aligning words in English-Hindi parallel corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an alignment system that aligns texts at the word level in Hindi-Punjabi parallel corpus. The previous aligner was based on length based estimation approach. In the previous version, multi-word unit & sometime one-to-one produces alignment errors. In this improved version, different techniques like Boundary Detection, Dictionary-Lookup (DL), Nearest-align-Neighbor (NAN) and Scoring based Minimum distance function to improve the accuracy has been used. Alignment of words means to identify correspondences between words in source language and target language sentences. This automatic word alignment of Hindi-Punjabi corpus is very useful in automatically developing Hindi-Punjabi dictionary. In the previous version, the system accuracy was claimed to be 89.5 % approximately but after rigorous testing, it is found to be 65%. After implementing above techniques in the improved system explained here, system accuracy was found to be 99.09% for one-to-one word alignment and 80% accuracy for multi-word alignment.