Speech Emotion Perception by Human and Machine

  • Authors:
  • Szabolcs Levente Tóth;David Sztahó;Klára Vicsi

  • Affiliations:
  • Department of Telecommunications and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, Budapest, Hungary 1111;Department of Telecommunications and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, Budapest, Hungary 1111;Department of Telecommunications and Media Informatics, Laboratory of Speech Acoustics, Budapest University of Technology and Economics, Budapest, Hungary 1111

  • Venue:
  • Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

The human speech contains and reflects information about the emotional state of the speaker. The importance of research of emotions is increasing in telematics, information technologies and even in health services. The research of the mean acoustical parameters of the emotions is a very complicated task. The emotions are mainly characterized by suprasegmental parameters, but other segmental factors can contribute to the perception of the emotions as well. These parameters are varying within one language, according to speakers etc. In the first part of our research work, human emotion perception was examined. Steps of creating an emotional speech database are presented. The database contains recordings of 3 Hungarian sentences with 8 basic emotions pronounced by nonprofessional speakers. Comparison of perception test results obtained with database recorded by nonprofessional speakers showed similar recognition results as an earlier perception test obtained with professional actors/actresses. It was also made clear, that a neutral sentence before listening to the expression of the emotion pronounced by the same speakers cannot help the perception of the emotion in a great extent. In the second part of our research work, an automatic emotion recognition system was developed. Statistical methods (HMM) were used to train different emotional models. The optimization of the recognition was done by changing the acoustic preprocessing parameters and the number of states of the Markov models.