Analysis of emotion recognition using facial expressions, speech and multimodal information

  • Authors:
  • Carlos Busso;Zhigang Deng;Serdar Yildirim;Murtaza Bulut;Chul Min Lee;Abe Kazemzadeh;Sungbok Lee;Ulrich Neumann;Shrikanth Narayanan

  • Affiliations:
  • University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles;University of Southern California, Los Angeles

  • Venue:
  • Proceedings of the 6th international conference on Multimodal interfaces
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. Although several approaches have been proposed to recognize human emotions based on facial expressions or speech, relatively limited work has been done to fuse these two, and other, modalities to improve the accuracy and robustness of the emotion recognition system. This paper analyzes the strengths and the limitations of systems based only on facial expressions or acoustic information. It also discusses two approaches used to fuse these two modalities: decision level and feature level integration. Using a database recorded from an actress, four emotions were classified: sadness, anger, happiness, and neutral state. By the use of markers on her face, detailed facial motions were captured with motion capture, in conjunction with simultaneous speech recordings. The results reveal that the system based on facial expression gave better performance than the system based on just acoustic information for the emotions considered. Results also show the complementarily of the two modalities and that when these two modalities are fused, the performance and the robustness of the emotion recognition system improve measurably.