Fundamentals of speech recognition
Fundamentals of speech recognition
A program for aligning sentences in bilingual corpora
Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Two languages are more informative than one
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Aligning sentences in parallel corpora
ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Char_align: a program for aligning parallel texts at the character level
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning sentences in bilingual corpora using lexical information
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Aligning a parallel English-Chinese corpus statistically with lexical criteria
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A comparison of alignment models for statistical machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
HMM-based word alignment in statistical translation
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Adaptive Bilingual Sentence Alignment
AMTA '02 Proceedings of the 5th Conference of the Association for Machine Translation in the Americas on Machine Translation: From Research to Real Users
Bilingual sentence alignment based on punctuation statistics and lexicon
IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
Hi-index | 0.00 |
Most current sentence alignment approaches adopt sentence length and cognate as the alignment features; and they are mostly trained and tested in the documents with the same style. Since the length distribution, alignment-type distribution (used by length-based approaches) and cognate frequency vary significantly across texts with different styles, the length-based approaches fail to achieve similar performance when tested in corpora of different styles. The experiments show that the performance in F-measure could drop from 98.2% to 85.6% when a length-based approach is trained by a technical manual and then tested on a general magazine.Since a large percentage of content words in the source text would be translated into the corresponding translation duals to preserve the meaning in the target text, transfer lexicons are usually regarded as more reliable cues for aligning sentences when the alignment task is performed by human. To enhance the robustness, a robust statistical model based on both transfer lexicons and sentence lengths are proposed in this paper. After integrating the transfer lexicons into the model, a 60% F-measure error reduction (from 14.4% to 5.8%) is observed.