Two stage emotion recognition based on speaking rate

Authors:
Shashidhar G. Koolagudi;Rao Sreenivasa Krothapalli
Affiliations:
School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2011

Citing 7
Cited 5

Fundamentals of speech recognition

Fundamentals of speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
Modeling durations of syllables using neural networks

Computer Speech and Language
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
A Speaking Rate Adjustable Digital Speech Repeater for Listening Comprehension in Second-Language Learning

CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 05
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing

Spectral slope based analysis and classification of stressed speech

International Journal of Speech Technology
Emotion recognition from speech using global and local prosodic features

International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology
Employing both gender and emotion cues to enhance speaker identification performance in emotional talking environments

International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes two stage speech emotion recognition approach using speaking rate. The emotions considered in this study are anger, disgust, fear, happy, neutral, sadness, sarcastic and surprise. At the first stage, based on speaking rate, eight emotions are categorized into 3 broad groups namely active (fast), normal and passive (slow). In the second stage, these 3 broad groups are further classified into individual emotions using vocal tract characteristics. Gaussian mixture models (GMM) are used for developing the emotion models. Emotion classification performance at broader level, based on speaking rate is found to be around 99% for speaker and text dependent cases. Performance of overall emotion classification is observed to be improved using the proposed two stage approach. Along with spectral features, the formant features are explored in the second stage, to achieve robust emotion recognition performance in case of speaker, gender and text independent cases.