Human perception of audio-visual synthetic character emotion expression in the presence of ambiguous and conflicting information

Authors:
Emily Mower;Maja J. Mataric;Shrikanth Narayanan
Affiliations:
Department of Electrical Engineering, University of Southern California, University Park, Los Angeles, CA;Department of Computer Science, University of Southern California, University Park, Los Angeles, CA;Department of Electrical Engineering and Department of Computer Science, University of Southern California, University Park, Los Angeles, CA
Venue:
IEEE Transactions on Multimedia
Year:
2009

Citing 11
Cited 4

Affective computing

Affective computing
Machine Learning

Machine Learning
Emotion and sociable humanoid robots

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
The production and recognition of emotions in speech: features and algorithms

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
Recognizing Facial Expression: Machine Learning and Application to Spontaneous Behavior

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
An empirical study of machine learning techniques for affect recognition in human–robot interaction

Pattern Analysis & Applications
Toward virtual humans

AI Magazine - Special issue on achieving human-level AI through integrated systems and research
Primitives-based evaluation and estimation of emotions in speech

Speech Communication
Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays

ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
A domain-independent framework for modeling emotion

Cognitive Systems Research

Contextual recognition of robot emotions

TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
Listening to sad music while seeing a happy robot face

ICSR'11 Proceedings of the Third international conference on Social Robotics
It's not all written on the robot's face

Robotics and Autonomous Systems
Affective and cognitive design for mass personalization: status and prospect

Journal of Intelligent Manufacturing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer simulated avatars and humanoid robots have an increasingly prominent place in today's world. Acceptance of these synthetic characters depends on their ability to properly and recognizably convey basic emotion states to a user population. This study presents an analysis of the interaction between emotional audio (human voice) and video (simple animation) cues. The emotional relevance of the channels is analyzed with respect to their effect on human perception and through the study of the extracted audio-visual features that contribute most prominently to human perception. As a result of the unequal level of expressivity across the two channels, the audio was shown to bias the perception of the evaluators. However, even in the presence of a strong audio bias, the video data were shown to affect human perception. The feature sets extracted from emotionally matched audio-visual displays contained both audio and video features while feature sets resulting from emotionally mismatched audio-visual displays contained only audio information. This result indicates that observers integrate natural audio cues and synthetic video cues only when the information expressed is in congruence. It is therefore important to properly design the presentation of audio-visual cues as incorrect design may cause observers to ignore the information conveyed in one of the channels.