A systematic comparison of various statistical alignment models
Computational Linguistics
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Parallel corpora segmentation using anchor words
EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Partitioning parallel documents using binary segmentation
StatMT '06 Proceedings of the Workshop on Statistical Machine Translation
Hi-index | 0.00 |
This paper introduces a number of quality metrics which can be used to automatically detect incorrectly aligned segment pairs. This is an important issue in commercial machine translation as segmentation and alignment of bilingual corpora is often performed by third parties whose quality assurances cannot always be relied upon. The metrics in this paper are based on the normalized logarithm of the alignment score of a segment pair, where the alignment score is calculated using an IBM translation model 4. The alignment quality metrics are evaluated in classification experiments on a Chinese-English patent translation task and are shown to yield satisfactory performance.