Paragraph-Level alignment of an english-spanish parallel corpus of fiction texts using bilingual dictionaries

Authors:
Alexander Gelbukh;Grigori Sidorov;José Ángel Vera-Félix
Affiliations:
Natural Language and Text Processing Laboratory, Center for Research in Computer Science, National Polytechnic Institute, Mexico City, Mexico;Natural Language and Text Processing Laboratory, Center for Research in Computer Science, National Polytechnic Institute, Mexico City, Mexico;Natural Language and Text Processing Laboratory, Center for Research in Computer Science, National Polytechnic Institute, Mexico City, Mexico
Venue:
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Year:
2006

Citing 10
Cited 2

A Multilingual Procedure for Dictionary-Based Sentence Alignment

AMTA '98 Proceedings of the Third Conference of the Association for Machine Translation in the Americas on Machine Translation and the Information Soup
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
Methods and practical issues in evaluating alignment techniques

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Lexical disambiguation using simulated annealing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Approach to construction of automatic morphological analysis systems for inflective languages with little effort

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Automatic image annotation based on wordnet and hierarchical ensembles

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
On some optimization heuristics for lesk-like WSD algorithms

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems

Alignment of paragraphs in bilingual texts using bilingual dictionaries and dynamic programming

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications
Lexical-based alignment for reconstruction of structure in parallel texts

NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text Experimental results of alignment at paragraph level are described.