Emotion recognition from speech using source, system, and prosodic features

Authors:
Shashidhar G. Koolagudi;K. Sreenivasa Rao
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2012

Citing 27
Cited 2

Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Usefulness of the LPC-residue in text-independent speaker verification

Speech Communication
Principal component neural networks: theory and applications

Principal component neural networks: theory and applications
Emotional stress in synthetic speech: progress and future directions

Speech Communication - Special issue on speech under stress
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
AANN: an alternative to GMM for pattern recognition

Neural Networks
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
The production and recognition of emotions in speech: features and algorithms

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
Artificial Neural Networks

Artificial Neural Networks
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Pitch Synchronous Analysis Method and Fisher Criterion Based Speaker Identification

ICNC '07 Proceedings of the Third International Conference on Natural Computation - Volume 02
Feature Combination for Better Differentiating Anger from Neutral in Mandarin Emotional Speech

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Intonation modeling for Indian languages

Computer Speech and Language
Adaptive and Optimal Classification of Speech Emotion Recognition

ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 05
Statistical Evaluation of Speech Features for Emotion Recognition

ICDT '09 Proceedings of the 2009 Fourth International Conference on Digital Telecommunications
Features extraction for speech emotion

Journal of Computational Methods in Sciences and Engineering
Determining mixing parameters from multispeaker data using speech-specific information

IEEE Transactions on Audio, Speech, and Language Processing
Exploring Speech Features for Classifying Emotions along Valence Dimension

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Study on speech emotion recognition system in E-learning

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Class-level spectral features for emotion recognition

Speech Communication
Spectral mapping using artificial neural networks for voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Combining acoustic features for improved emotion recognition in mandarin speech

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing

Emotion recognition from speech using global and local prosodic features

International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, source, system, and prosodic features of speech are explored for characterizing and classifying the underlying emotions. Different speech features contribute in different ways to express the emotions, due to their complementary nature. Linear prediction residual samples chosen around glottal closure regions, and glottal pulse parameters are used to represent excitation source information. Linear prediction cepstral coefficients extracted through simple block processing and pitch synchronous analysis represent the vocal tract information. Global and local prosodic features extracted from gross statistics and temporal dynamics of the sequence of duration, pitch, and energy values represent the prosodic information. Emotion recognition models are developed using above mentioned features separately, and in combination. Simulated Telugu emotion database (IITKGP-SESC) is used to evaluate the proposed features. The emotion recognition results obtained using IITKGP-SESC are compared with the results of internationally known Berlin emotion speech database (Emo-DB). Autoassociative neural networks, Gaussian mixture models, and support vector machines are used to develop emotion recognition systems with source, system, and prosodic features, respectively. Weighted combination of evidence has been used while combining the performance of systems developed using different features. From the results, it is observed that, each of the proposed speech features has contributed toward emotion recognition. The combination of features improved the emotion recognition performance, indicating the complementary nature of the features.