Emotion recognition from speech using global and local prosodic features

Authors:
K. Sreenivasa Rao;Shashidhar G. Koolagudi;Ramu Reddy Vempada
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2013

Citing 20
Cited 1

Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Prosodic aspects of speech

Fundamentals of speech synthesis and speech recognition
Emotional stress in synthetic speech: progress and future directions

Speech Communication - Special issue on speech under stress
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
A corpus-based speech synthesis system with emotion

Speech Communication - Special issue on speech and emotion
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Adaptive and Optimal Classification of Speech Emotion Recognition

ICNC '08 Proceedings of the 2008 Fourth International Conference on Natural Computation - Volume 05
Statistical Evaluation of Speech Features for Emotion Recognition

ICDT '09 Proceedings of the 2009 Fourth International Conference on Digital Telecommunications
Study on speech emotion recognition system in E-learning

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Application of prosody models for developing speech systems in Indian languages

International Journal of Speech Technology
Two stage emotion recognition based on speaking rate

International Journal of Speech Technology
Recognition of emotions from video using neural network models

Expert Systems with Applications: An International Journal
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing
Emotion recognition from speech: a review

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology

Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, global and local prosodic features extracted from sentence, word and syllables are proposed for speech emotion or affect recognition. In this work, duration, pitch, and energy values are used to represent the prosodic information, for recognizing the emotions from speech. Global prosodic features represent the gross statistics such as mean, minimum, maximum, standard deviation, and slope of the prosodic contours. Local prosodic features represent the temporal dynamics in the prosody. In this work, global and local prosodic features are analyzed separately and in combination at different levels for the recognition of emotions. In this study, we have also explored the words and syllables at different positions (initial, middle, and final) separately, to analyze their contribution towards the recognition of emotions. In this paper, all the studies are carried out using simulated Telugu emotion speech corpus (IITKGP-SESC). These results are compared with the results of internationally known Berlin emotion speech corpus (Emo-DB). Support vector machines are used to develop the emotion recognition models. The results indicate that, the recognition performance using local prosodic features is better compared to the performance of global prosodic features. Words in the final position of the sentences, syllables in the final position of the words exhibit more emotion discriminative information compared to the words and syllables present in the other positions.