Personalising speech-to-speech translation in the EMIME project

Authors:
Mikko Kurimo;William Byrne;John Dines;Philip N. Garner;Matthew Gibson;Yong Guan;Teemu Hirsimäki;Reima Karhila;Simon King;Hui Liang;Keiichiro Oura;Lakshmi Saheer;Matt Shannon;Sayaka Shiota;Jilei Tian;Keiichi Tokuda;Mirjam Wester;Yi-Jian Wu;Junichi Yamagishi
Affiliations:
Aalto University, Finland;University of Cambridge, UK;Idiap Research Institute, Switzerland;Idiap Research Institute, Switzerland;University of Cambridge, UK;Nokia Research Center, Beijing, China;Aalto University, Finland;Aalto University, Finland;University of Edinburgh, UK;Idiap Research Institute, Switzerland;Nagoya Institute of Technology, Japan;Idiap Research Institute, Switzerland;University of Cambridge, UK;Nagoya Institute of Technology, Japan;Nokia Research Center, Beijing, China;Nagoya Institute of Technology, Japan;University of Edinburgh, UK;Nagoya Institute of Technology, Japan;University of Edinburgh, UK
Venue:
ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Year:
2010

Citing 5
Cited 0

Minimum Bayes risk combination of translation hypotheses from alternative morphological decompositions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Robust speaker-adaptive HMM-based text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Thousands of voices for HMM-based speech synthesis: analysis and application of TTS systems built on various ASR corpora

IEEE Transactions on Audio, Speech, and Language Processing
Overview and results of Morpho challenge 2009

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Importance of High-Order N-Gram Models in Morph-Based Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users' interaction across language barriers by making the output speech sound more like the original speaker's way of speaking, even if she or he could not speak the output language.