Fast and Accurate Sentence Alignment of Bilingual Corpora
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
A systematic comparison of various statistical alignment models
Computational Linguistics
Using cognates to align sentences in bilingual corpora
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: distributed computing - Volume 2
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
K-vec: a new approach for aligning parallel texts
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 2
Identification of confusable drug names: a new approach and evaluation methodology
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Bootstrapping bilingual lexicons from comparable corpora for closely related languages
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Hi-index | 0.00 |
We present the outline of a robust, precision oriented alignment method that deals with a corpus of comparable texts without standardized spelling or sentence boundary marking. The method identifies comparable sequences over a source and target text using a bilingual dictionary, uses various methods to assign a confidence score, and only keeps the highest scoring sequences. For comparison, a conventional alignment is done with a heuristic sentence splitting beforehand. Both methods are evaluated over transcriptions of two historical documents in different Early New High German dialects, and the method developed is found to outperform the competing one by a great margin.