Using natural alignment to extract translation equivalents

Authors:
Pablo Gamallo Otero
Affiliations:
Departamento de Língua Espanhola, Faculdade de Filologia, Universidade de Santiago de Compostela, Galiza, Spain
Venue:
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Year:
2006

Citing 9
Cited 1

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
Translating collocations for bilingual lexicons: a statistical approach

Computational Linguistics
Bilingual Sentence Alignment: Balancing Robustness and Accuracy

Machine Translation
Bitext maps and alignment via pattern recognition

Computational Linguistics
A word-to-word model of translational equivalence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A simple hybrid aligner for generating lexical correspondences in parallel texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Using confidence bands for parallel texts alignment

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

The english unknown term translation mining with improved bilingual snippets collection strategy

ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most methods to extract bilingual lexicons from parallel corpora learn word correspondences using relative small aligned segments, called sentences. Then, they need to get a corpus aligned at the sentence level. Such an alignment can require further manual corrections if the parallel corpus contains insertions, deletions, or fuzzy sentence boundaries. This paper shows that it is possible to extract bilingual lexicons without aligning parallel texts at the sentence level. We describe a method to learn word translations from a very roughly aligned corpus, namely a corpus with quite long segments separated by “natural boundaries”. The results obtained using this method are very close to those obtained using sentence alignment. Some experiments were performed on English-Portuguese and English-Spanish parallel texts.