An experiment in hybrid dictionary and statistical sentence alignment

Authors:
Nigel Collier;Kenji Ono;Hideki Hirakawa
Affiliations:
Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Kanagawa, Japan;Toshiba Corporation, Kanagawa, Japan
Venue:
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Year:
1998

Citing 9
Cited 4

A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Machine translation vs. dictionary term translation: a comparison for English-Japanese news article alignment

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2

Has There Been a Revolution in Machine Translation?

Machine Translation
A Multilingual Procedure for Dictionary-Based Sentence Alignment

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Automatic extraction of bilingual word pairs using inductive chain learning in various languages

Information Processing and Management: an International Journal
Sentence alignment of Hungarian-English parallel corpora using a hybrid algorithm

Acta Cybernetica

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of aligning sentences in parallel corpora of two languages has been well studied using pure statistical or linguistic models. We developed a linguistic method based on lexical matching with a bilingual dictionary and two statistical methods based on sentence length ratios and sentence offset probabilities. This paper seeks to further our knowledge of the alignment task by comparing the performance of the alignment models when used separately and together, i.e. as a hybrid system. Our results show that for our English-Japanese corpus of newspaper articles, the hybrid system using lexical matching and sentence length ratios outperforms the pure methods.