Enriching machine-mediated speech-to-speech translation using contextual information

Authors:
Vivek Kumar Rangarajan Sridhar;Srinivas Bangalore;Shrikanth Narayanan
Affiliations:
AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, United States;AT&T Labs - Research, 180 Park Avenue, Florham Park, NJ 07932, United States;University of Southern California, Ming Hsieh Department of Electrical Engineering, 3740 McClintock Avenue, Room EEB430, Los Angeles, CA 90089 2564, United States
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 0

How may I help you?

Speech Communication - Special issue on interactive voice technology for telecommunication applications (IVITA '96)
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A systematic comparison of various statistical alignment models

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
A syntax-based statistical translation model

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Combining lexical, syntactic and prosodic cues for improved online dialog act tagging

Computer Speech and Language
Active learning with statistical models

Journal of Artificial Intelligence Research
Efficient Speech Translation Through Confusion Network Decoding

IEEE Transactions on Audio, Speech, and Language Processing
The ATR multilingual speech-to-speech translation system

IEEE Transactions on Audio, Speech, and Language Processing
Concept-based speech-to-speech translation using maximum entropy models for statistical natural concept generation

IEEE Transactions on Audio, Speech, and Language Processing
Exploiting Acoustic and Syntactic Features for Automatic Prosody Labeling in a Maximum Entropy Framework

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conventional approaches to speech-to-speech (S2S) translation typically ignore key contextual information such as prosody, emphasis, discourse state in the translation process. Capturing and exploiting such contextual information is especially important in machine-mediated S2S translation as it can serve as a complementary knowledge source that can potentially aid the end users in improved understanding and disambiguation. In this work, we present a general framework for integrating rich contextual information in S2S translation. We present novel methodologies for integrating source side context in the form of dialog act (DA) tags, and target side context using prosodic word prominence. We demonstrate the integration of the DA tags in two different statistical translation frameworks, phrase-based translation and a bag-of-words lexical choice model. In addition to producing interpretable DA annotated target language translations, we also obtain significant improvements in terms of automatic evaluation metrics such as lexical selection accuracy and BLEU score. Our experiments also indicate that finer representation of dialog information such as yes-no questions, wh-questions and open questions are the most useful in improving translation quality. For target side enrichment, we employ factored translation models to integrate the assignment and transfer of prosodic word prominence (pitch accents) during translation. The factored translation models provide significant improvement in assignment of correct pitch accents to the target words in comparison with a post-processing approach. Our framework is suitable for integrating any word or utterance level contextual information that can be reliably detected (recognized) from speech and/or text.