Using cognates to align sentences in bilingual corpora

Authors:
Michel Simard;George F. Foster;Pierre Isabelle
Affiliations:
Centre for Information Technologies Innovation, Laval, Québec, Canada;Centre for Information Technologies Innovation, Laval, Québec, Canada;Centre for Information Technologies Innovation, Laval, Québec, Canada
Venue:
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
Year:
1993

Citing 3
Cited 10

A statistical approach to machine translation

Computational Linguistics
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics

Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
The Origins of the Translator‘s Workstation

Machine Translation
Semi-automatic acquisition of domain-specific translation lexicons

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Text alignment in the real world: improving alignments of noisy translations using common lexical features, string matching strategies and n-gram comparisons

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
An alignment method for noisy parallel corpora based on image processing techniques

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Methods and practical issues in evaluating alignment techniques

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Creating a multilingual collocation dictionary from large text corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Fast-Champollion: a fast and robust sentence alignment algorithm

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Aligning the un-alignable -- a pilot study using a noisy corpus of nonstandardized, semi-parallel texts

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a recent paper, Gale and Church describe an inexpensive method for aligning bitext, based exclusively on sentence lengths [3]. While this method produces surprisingly good results (a success rate around 96%), even better results are required to perform such tasks as the computer-assisted revision of translations. In this paper, we examine some of the weaknesses of Gale and Church's program, and explain how just a small amount of linguistic knowledge would help to overcome these weaknesses. We discuss how cognates provide for a cheap and reasonably reliable source of linguistic knowledge. To illustrate this, we describe a modification to the program in which the criterion is cognates rather than sentence lengths. Finally, we show how better and more efficient results may be obtained by combining the two criteria length and "cogneteness". Our method can be generalized to accommodate other sources of linguistic knowledge, and experimentation shows that it produces better results than alignments based on length alone, at a minimal cost.