Fundamentals of speech recognition
Fundamentals of speech recognition
Effects of speaking rate and word frequency on pronunciations in conversational speech
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Describing the emotional states that are expressed in speech
Speech Communication - Special issue on speech and emotion
Modeling durations of syllables using neural networks
Computer Speech and Language
Springer Handbook of Speech Processing
Springer Handbook of Speech Processing
CSSE '08 Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 05
Epoch Extraction From Speech Signals
IEEE Transactions on Audio, Speech, and Language Processing
Spectral slope based analysis and classification of stressed speech
International Journal of Speech Technology
Emotion recognition from speech using global and local prosodic features
International Journal of Speech Technology
Characterization and recognition of emotions from speech using excitation source information
International Journal of Speech Technology
International Journal of Speech Technology
Pitch synchronous and glottal closure based speech analysis for language recognition
International Journal of Speech Technology
Hi-index | 0.01 |
This paper proposes two stage speech emotion recognition approach using speaking rate. The emotions considered in this study are anger, disgust, fear, happy, neutral, sadness, sarcastic and surprise. At the first stage, based on speaking rate, eight emotions are categorized into 3 broad groups namely active (fast), normal and passive (slow). In the second stage, these 3 broad groups are further classified into individual emotions using vocal tract characteristics. Gaussian mixture models (GMM) are used for developing the emotion models. Emotion classification performance at broader level, based on speaking rate is found to be around 99% for speaker and text dependent cases. Performance of overall emotion classification is observed to be improved using the proposed two stage approach. Along with spectral features, the formant features are explored in the second stage, to achieve robust emotion recognition performance in case of speaker, gender and text independent cases.