Visual contribution to speech perception: measuring the intelligibility of animated talking heads

Authors:
Slim Ouni;Michael M. Cohen;Hope Ishak;Dominic W. Massaro
Affiliations:
LORIA, Campus Scientifique, Vandoeure lès Nancy Cedex, France;Perceptual Science Laboratory, University of California, Santa Cruz, CA;Perceptual Science Laboratory, University of California, Santa Cruz, CA;Perceptual Science Laboratory, University of California, Santa Cruz, CA
Venue:
EURASIP Journal on Audio, Speech, and Music Processing
Year:
2007

Citing 1
Cited 4

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship

Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship

Read my lips: speech distortions in musical lyrics can be overcome (slightly) by facial information

Speech Communication
Emphatic visual speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Control of speech-related facial movements of an avatar from video

Speech Communication
Evaluating a synthetic talking head using a dual task: Modality effects on speech understanding and cognitive load

International Journal of Human-Computer Studies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Animated agents are becoming increasingly frequent in research and applications in speech science. An important challenge is to evaluate the effectiveness of the agent in terms of the intelligibility of its visible speech. In three experiments, we extend and test the Sumby and Pollack (1954) metric to allow the comparison of an agent relative to a standard or reference, and also propose a new metric based on the fuzzy logical model of perception (FLMP) to describe the benefit provided by a synthetic animated face relative to the benefit provided by a natural face. A valid metric would allow direct comparisons accross different experiments and would give measures of the benfit of a synthetic animated face relative to a natural face (or indeed any two conditions) and how this benefit varies as a function of the type of synthetic face, the test items (e.g., syllables versus sentences), different individuals, and applications.