Mining a comparable text corpus for a Vietnamese - French statistical machine translation system

Authors:
Thi-Ngoc-Diep Do;Viet-Bac Le;Brigitte Bigi;Laurent Besacier;Eric Castelli
Affiliations:
LIG Laboratory, Grenoble, France and MICA Center, Hanoi, Vietnam;LIG Laboratory, Grenoble, France;LIG Laboratory, Grenoble, France;LIG Laboratory, Grenoble, France;MICA Center, Hanoi, Vietnam
Venue:
StatMT '09 Proceedings of the Fourth Workshop on Statistical Machine Translation
Year:
2009

Citing 11
Cited 2

Introduction to the special issue on the web as corpus

Computational Linguistics - Special issue on web as corpus
The Web as a parallel corpus

Computational Linguistics - Special issue on web as corpus
Embedding web-based statistical translation models in cross-language information retrieval

Computational Linguistics - Special issue on web as corpus
Text-translation alignment

Computational Linguistics - Special issue on using large corpora: I
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
Aligning sentences in parallel corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
A program for aligning sentences in bilingual corpora

ACL '91 Proceedings of the 29th annual meeting on Association for Computational Linguistics
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Generation of Vietnamese for French-Vietnamese and English-Vietnamese machine translation

EWNLG '01 Proceedings of the 8th European workshop on Natural Language Generation - Volume 8
Extracting parallel sub-sentential fragments from non-parallel corpora

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

A hybrid morpheme-word representation for machine translation of morphologically rich languages

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Large scale parallel document mining for machine translation

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents our first attempt at constructing a Vietnamese-French statistical machine translation system. Since Vietnamese is an under-resourced language, we concentrate on building a large Vietnamese-French parallel corpus. A document alignment method based on publication date, special words and sentence alignment result is proposed. The paper also presents an application of the obtained parallel corpus to the construction of a Vietnamese-French statistical machine translation system, where the use of different units for Vietnamese (syllables, words, or their combinations) is discussed.