Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays

  • Authors:
  • Emily Mower;Maja J. Mataric;Shrikanth Narayanan

  • Affiliations:
  • -;-;-

  • Venue:
  • ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computer simulated avatars and humanoid robots have an increasingly prominent place in today's world. Acceptance of these synthetic characters depends on their ability to properly and recognizably convey basic emotion states to a user population. This study presents an analysis of audio-visual features that can be used to predict user evaluations of synthetic character emotion displays. These features include prosodic, spectral, and semantic properties of audio signals in addition to FACS-inspired video features. The goal of this paper is to identify the audio-visual features that explain the variance in the emotional evaluations of naive listeners through the utilization of information gain feature selection in conjunction with support vector machines. These results suggest that there exists an emotionally salient subset of the audio-visual feature space. The features that contribute most to the explanation of evaluator variance are the prior knowledge audio statistics (e.g., average valence rating), the high energy band spectral components, and the quartile pitch range. This feature subset should be correctly modeled and implemented in the design of synthetic expressive displays to convey the desired emotions.