Automatic alignment in parallel corpora

Authors:
Harris Papageorgiou;Lambros Cranias;Stelios Piperidis
Affiliations:
Institute for Language and Speech Processing, Athens, Greece;Institute for Language and Speech Processing, Athens, Greece;Institute for Language and Speech Processing, Athens, Greece
Venue:
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Year:
1994

Citing 5
Cited 5

Computational lexicons: the neat examples and the odd exemplars

ANLC '92 Proceedings of the third conference on Applied natural language processing
A practical part-of-speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics

Bitext maps and alignment via pattern recognition

Computational Linguistics
Example retrieval from a translation memory

Natural Language Engineering
Multilingual lexical database generation from parallel texts in 20 European languages with endogenous resources

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A web-enabled and speech-enhanced parallel corpus of Greek - Bulgarian cultural texts

LaTeCH-SHELT&R '09 Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education
Arabic to French sentence alignment: exploration of a cross-language information retrieval approach

Semitic '07 Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the alignment issue in the framework of exploitation of large bimultilingual corpora for translation purposes. A generic alignment scheme is proposed that can meet varying requirements of different applications. Depending on the level at which alignment is sought, appropriate surface linguistic information is invoked coupled with information about possible unit delimiters. Each text unit (sentence, clause or phrase) is represented by the sum of its content tags. The results are then fed into a dynamic programming framework that computes the optimum alignment of units. The proposed scheme has been tested at sentence level on parallel corpora of the CELEX database. The success rate exceeded 99%. The next steps of the work concern the testing of the scheme's efficiency at lower levels endowed with necessary bilingual information about potential delimiters.