Emotion recognition from speech using source, system, and prosodic features

  • Authors:
  • Shashidhar G. Koolagudi;K. Sreenivasa Rao

  • Affiliations:
  • School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, source, system, and prosodic features of speech are explored for characterizing and classifying the underlying emotions. Different speech features contribute in different ways to express the emotions, due to their complementary nature. Linear prediction residual samples chosen around glottal closure regions, and glottal pulse parameters are used to represent excitation source information. Linear prediction cepstral coefficients extracted through simple block processing and pitch synchronous analysis represent the vocal tract information. Global and local prosodic features extracted from gross statistics and temporal dynamics of the sequence of duration, pitch, and energy values represent the prosodic information. Emotion recognition models are developed using above mentioned features separately, and in combination. Simulated Telugu emotion database (IITKGP-SESC) is used to evaluate the proposed features. The emotion recognition results obtained using IITKGP-SESC are compared with the results of internationally known Berlin emotion speech database (Emo-DB). Autoassociative neural networks, Gaussian mixture models, and support vector machines are used to develop emotion recognition systems with source, system, and prosodic features, respectively. Weighted combination of evidence has been used while combining the performance of systems developed using different features. From the results, it is observed that, each of the proposed speech features has contributed toward emotion recognition. The combination of features improved the emotion recognition performance, indicating the complementary nature of the features.