High-performance bilingual text alignment using statistical and dictionary information

Authors:
Masahiko Haruno;Takefumi Yamazaki
Affiliations:
NTT Communication Science Laboratories, 1-1 Hikarinooka Yokosuka Kanagawa 239, Japan;NTT Communication Science Laboratories, 1-1 Hikarinooka Yokosuka Kanagawa 239, Japan
Venue:
Natural Language Engineering
Year:
1997

Citing 21
Cited 2

Some advances in transformation-based part of speech tagging

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
A program for aligning sentences in bilingual corpora

Computational Linguistics - Special issue on using large corpora: I
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A simple rule-based part of speech tagger

ANLC '92 Proceedings of the third conference on Applied natural language processing
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
An algorithm for finding noun phrase correspondences in bilingual corpora

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Structural matching of parallel texts

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
A pattern matching method for finding noun and proper noun translations from noisy parallel corpora

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Toward memory-based translation

COLING '90 Proceedings of the 13th conference on Computational linguistics - Volume 3
Bilingual text, matching using bilingual dictionary and statistics

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
K-vec: a new approach for aligning parallel texts

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Building an MT dictionary from parallel texts based on linguistic and statistical information

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Learning translation templates from bilingual text

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
CTM: an example-based translation aid system

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 4
Learning bilingual collocations by word-level sorting

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Translating collocations for use in bilingual lexicons

HLT '94 Proceedings of the workshop on Human Language Technology

A Multilingual Procedure for Dictionary-Based Sentence Alignment

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Extracting parallel paragraphs and sentences from english-persian translated documents

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an accurate and robust text alignment system for structurally different languages. Among structurally different languages such as Japanese and English, there is a limitation on the amount of word correspondences that can be statistically acquired. The main reason for this is the systems of functional (closed) words are quite different in the two languages. The proposed method makes use of two kinds of word correspondences in aligning bilingual texts. One is a bilingual dictionary of general use. The other is the word correspondences that are statistically acquired in the alignment process. Our method gradually determines sentence pairs (anchors) that correspond to each other by relaxing parameters. The method, by combining two kinds of word correspondences, achieves adequate word correspondences for complete alignment. As a result, texts of various length and of various genres in structurally different languages can be aligned with high precision. Experimental results show our system outperforms conventional methods for various kinds of Japanese–English texts.