Bilingual Sentence Alignment: Balancing Robustness and Accuracy
Machine Translation
The EuTrans Spoken Language Translation System
Machine Translation
Phrase-Based Statistical Machine Translation
KI '02 Proceedings of the 25th Annual German Conference on AI: Advances in Artificial Intelligence
Stochastic Modeling: From Pattern Classification to Speech Recognition and Translation
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 3
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Methods and practical issues in evaluating alignment techniques
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Effective phrase translation extraction from alignment models
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Machine Translation with Inferred Stochastic Finite-State Transducers
Computational Linguistics
A phrase-based, joint probability model for statistical machine translation
EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A projection extension algorithm for statistical machine translation
EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Hi-index | 0.00 |
Segmentation of bilingual text corpora is a very important issue to deal with in machine translation. In this paper we present a new method to perform bilingual segmentation of a parallel corpus, SPBalign, which is based on phrase-based statistical translation models. The new technique proposed here is compared with other two existing techniques, which are also based on statistical translation methods: the RECalign technique, which is based on the concept of recursive alignment, and the GIATIalign technique, which is based on simple word alignments. Experimental results are obtained for the EuTrans-I English-Spanish task, in order to create new, shorter bilingual segments to be included in a translation memory database. The evaluation of these three methods has been performed comparing the bilingual segmentations obtained by these techniques with respect to a manually segmented bilingual test corpus. These results show us that the new method proposed here outperforms in all cases the two already proposed bilingual segmentation techniques.