Enabling effective design of multimodal interfaces for speech-to-speech translation system: An empirical study of longitudinal user behaviors over time and user strategies for coping with errors

Authors:
Jongho Shin;Panayiotis G. Georgiou;Shrikanth Narayanan
Affiliations:
Signal Analysis and Interpretation Laboratory (SAIL), USC Viterbi School of Engineering, Los Angeles, CA 90089, United States;Signal Analysis and Interpretation Laboratory (SAIL), USC Viterbi School of Engineering, Los Angeles, CA 90089, United States;Signal Analysis and Interpretation Laboratory (SAIL), USC Viterbi School of Engineering, Los Angeles, CA 90089, United States
Venue:
Computer Speech and Language
Year:
2013

Citing 12
Cited 0

Users are individuals: individualizing user models

International Journal of Human-Computer Studies - Special issue: 1969-1999, the 30th anniversary
Predictive Statistical Models for User Modeling

User Modeling and User-Adapted Interaction
Harnessing Models of Users' Goals to Mediate Clarification Dialog in Spoken Language Systems

UM '01 Proceedings of the 8th International Conference on User Modeling 2001
Speech and Language Processing for Multimodal Human-Computer Interaction

Journal of VLSI Signal Processing Systems
When do we interact multimodally?: cognitive load and multimodal communication patterns

Proceedings of the 6th international conference on Multimodal interfaces
Personalized User Preference Elicitation for e-Services

EEE '05 Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) on e-Technology, e-Commerce and e-Service
Flexible guidance generation using user model in spoken dialogue systems

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
User modeling in a speech translation driven mediated interaction setting

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Human-centered design meets cognitive load theory: designing interfaces that help people think

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning

HLT '02 Proceedings of the second international conference on Human Language Technology Research
IBM MASTOR system: multilingual automatic speech-to-speech translator

MST '06 Proceedings of the Workshop on Medical Speech Translation
Culturally adaptive software: moving beyond internationalization

UI-HCII'07 Proceedings of the 2nd international conference on Usability and internationalization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The study provides an empirical analysis of long-term user behavioral changes and varying user strategies during cross-lingual interaction using the multimodal speech-to-speech (S2S) translation system of USC/SAIL. The goal is to inform user adaptive designs of such systems. A 4-week medical-scenario-based study provides the basis for our analysis. The data analyzed includes user interviews, post-session surveys, and the extensive system logs that were post-processed and annotated. The annotations measured the meaning transfer rates using human evaluations and a scale defined here called the concept matching score. First, qualitative data analysis investigates user strategies in dealing with errors, such as repeat, rephrase, change topic, start over, and the participants' self-reported longitudinal adaptation to errors. Post-session surveys explore participant experience with the system and point to a trend of user-perceived increased performance over time. The log data analysis provides further insightful results. Users chose to allow some degradation (84% of original concepts) of their intended meaning to proceed through the system, even after they observed potential errors in the visual output from the speech recognizer. The rejected utterances, on average, had only 25% of the original concepts. This user-filtered outcome, after the complete channel transfer through the S2S system, is that 91% of the successful turns result in transfer of at least half the intended concepts while 90% of the user rejected turns would have conveyed less than half the intended meaning. The multimodal interface results in 24% relative improvement in the confirmation mode and in 31% relative improvement in the choice mode compared to the speech-only modality. Analysis also showed that users of the multimodal interface temporally change their strategies by accepting more system-produced choices. This user behavior can expedite communication seeking an operating balance between user strategies and system performance factors. Lastly, user utterance length is analyzed. Longer utterances in general imply more information delivered per utterance but potentially at the cost of increased processing degradation. The analysis demonstrates that users reduce their utterance length after unsuccessful turns and increase it after successful turns and that there is a learning effect that increases this behavior over the duration of the study.