Evaluating a synthetic talking head using a dual task: Modality effects on speech understanding and cognitive load

Authors:
Catherine J. Stevens;Guillaume Gibert;Yvonne Leung;Zhengzhi Zhang
Affiliations:
MARCS Institute, University of Western Sydney, Australia;MARCS Institute, University of Western Sydney, Australia;MARCS Institute, University of Western Sydney, Australia;MARCS Institute, University of Western Sydney, Australia
Venue:
International Journal of Human-Computer Studies
Year:
2013

Citing 8
Cited 0

The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences

Speech Communication
Evaluation of multimodal behaviour of embodied agents

From brows to trust
Visual contribution to speech perception: measuring the intelligibility of animated talking heads

EURASIP Journal on Audio, Speech, and Music Processing
On the importance of audiovisual coherence for the perceived quality of synthesized visual speech

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on animating virtual speakers or singers from audio: Lip-synching facial animation
Quality of talking heads in different interaction and media contexts

Speech Communication
Gaze, conversational agents and face-to-face communication

Speech Communication
Natural discourse reference generation reduces cognitive load in spoken systems

Natural Language Engineering
"Yours is better!": participant response bias in HCI

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The dual task is a data-rich paradigm for evaluating speech modes of a synthetic talking head. Three experiments manipulated auditory-visual (AV) and auditory-only (A-only) speech produced by text-to-speech synthesis from a talking head (Experiment 1-single task; Experiment 2-dual task), and natural speech produced by a human male similar in appearance to the talking head (Experiment 3-dual task). In a dual task, participants perform two tasks concurrently with a secondary reaction time (RT) task sensitive to cognitive processing demands of the primary task. In the primary task, participants either shadowed words or named the superordinate categories to which words belonged under AV (dynamic face with lips moving) or A-only (static face) speech modes. First, it was hypothesized that category naming is more difficult than shadowing. The hypothesis was supported in each experiment with significantly longer latencies on the primary task and slower RT on the secondary task. Second, an AV advantage was hypothesized and supported by significantly shorter latencies for the AV modality on the primary task of Experiment 3 and with partial support in Experiment 1. Third, it was hypothesized that while the AV modality helps it also creates great cognitive load. Significantly longer RT for AV presentation in the secondary tasks supported this hypothesis. The results indicate that task difficulty influences speech perception. Performance on a secondary task can reveal cognitive demand that is not evident in a single task or self-report ratings. A dual task will be an effective evaluation tool in operational environments where multiple tasks are conducted (e.g., responding to spoken directions and monitoring displays) and an implicit, sensitive measure of cognitive load is imperative.