Aligning words in English-Hindi parallel corpora

Authors:
Niraj Aswani;Robert Gaizauskas
Affiliations:
University of Sheffield, Sheffield, UK;University of Sheffield, Sheffield, UK
Venue:
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Year:
2005

Citing 1
Cited 3

Rapid customization of an information extraction system for a surprise language

ACM Transactions on Asian Language Information Processing (TALIP)

Word alignment for languages with scarce resources using bilingual corpora of other language pairs

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Pivot language approach for phrase-based statistical machine translation

Machine Translation
Improved algorithm for automatic word alignment for hindi-punjabi parallel corpus

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a word alignment algorithm for English-Hindi parallel data. The system was developed to participate in the shared task on word alignment for languages with scarce resources at the ACL 2005 workshop, on "Building and using parallel texts: data driven machine translation and beyond". Our word alignment algorithm is based on a hybrid method which performs local word grouping on Hindi sentences and uses other methods such as dictionary lookup, transliteration similarity, expected English words and nearest aligned neighbours. We trained our system on the training data provided to obtain a list of named entities and cognates and to collect rules for local word grouping in Hindi sentences. The system scored 77.03% precision and 60.68% recall on the shared task unseen test data.