Dialect translation: integrating Bayesian co-segmentation models with pivot-based SMT

Authors:
Michael Paul;Andrew Finch;Paul R. Dixon;Eiichiro Sumita
Affiliations:
National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan;National Institute of Information and Communications Technology, Kyoto, Japan
Venue:
DIALECTS '11 Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties
Year:
2011

Citing 19
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model

HLT-NAACL-PARALLEL '03 Proceedings of the HLT-NAACL 2003 Workshop on Building and using parallel texts: data driven machine translation and beyond - Volume 3
Automatic generation of Japanese–English bilingual thesauri based on bilingual corpora

Journal of the American Society for Information Science and Technology - Research Articles
A joint source-channel model for machine transliteration

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Dialect classification for online podcasts fusing acoustic and language based structural and semantic information

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Adaptive string distance measures for bilingual dialect lexicon induction

ACL '07 Proceedings of the 45th Annual Meeting of the ACL: Student Research Workshop
Bayesian semi-supervised Chinese word segmentation for statistical machine translation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Sampling alignment structure under a Bayesian translation model

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On the importance of pivot language selection for statistical machine translation

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Morphological analysis and generation for Arabic dialects

Semitic '05 Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages
Evaluation of string distance algorithms for dialectology

LD '06 Proceedings of the Workshop on Linguistic Distances
Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
OpenFst: a general and efficient weighted finite-state transducer library

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Transliteration generation and mining with limited training resources

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Language independent transliteration mining system using finite state automata framework

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Word segmentation for dialect translation

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Comparative study on corpora for speech translation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research on multilingual statistical machine translation (SMT) focuses on the usage of pivot languages in order to overcome resource limitations for certain language pairs. This paper proposes a new method to translate a dialect language into a foreign language by integrating transliteration approaches based on Bayesian co-segmentation (BCS) models with pivot-based SMT approaches. The advantages of the proposed method with respect to standard SMT approaches are three fold: (1) it uses a standard language as the pivot language and acquires knowledge about the relation between dialects and the standard language automatically, (2) it reduces the translation task complexity by using monotone decoding techniques, (3) it reduces the number of features in the log-linear model that have to be estimated from bilingual data. Experimental results translating four Japanese dialects (Kumamoto, Kyoto, Okinawa, Osaka) into four Indo-European languages (English, German, Russian, Hindi) and two Asian languages (Chinese, Korean) revealed that the proposed method improves the translation quality of dialect translation tasks and outperforms standard pivot translation approaches concatenating SMT engines for the majority of the investigated language pairs.