Automatic speech emotion recognition using modulation spectral features
Speech Communication
Emotion recognition using a hierarchical binary decision tree approach
Speech Communication
Investigating acoustic cues in automatic detection of learners' emotion from auto tutor
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Dimensionality reduction and classification analysis on the audio section of the SEMAINE database
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
International Journal of Speech Technology
Speaker state recognition using an HMM-based feature extraction method
Computer Speech and Language
Actor level emotion magnitude prediction in text and speech
Multimedia Tools and Applications
Emotion-aware assistive system for humanistic care based on the orange computing concept
Applied Computational Intelligence and Soft Computing - Special issue on Awareness Science and Engineering
Shape-based modeling of the fundamental frequency contour for emotion detection in speech
Computer Speech and Language
Compensating for speaker or lexical variabilities in speech for emotion recognition
Speech Communication
Class-specific multiple classifiers scheme to recognize emotions from speech signals
Computer Speech and Language
Human emotion recognition from videos using spatio-temporal and audio features
The Visual Computer: International Journal of Computer Graphics
Hi-index | 0.00 |
During expressive speech, the voice is enriched to convey not only the intended semantic message but also the emotional state of the speaker. The pitch contour is one of the important properties of speech that is affected by this emotional modulation. Although pitch features have been commonly used to recognize emotions, it is not clear what aspects of the pitch contour are the most emotionally salient. This paper presents an analysis of the statistics derived from the pitch contour. First, pitch features derived from emotional speech samples are compared with the ones derived from neutral speech, by using symmetric Kullback-Leibler distance. Then, the emotionally discriminative power of the pitch features is quantified by comparing nested logistic regression models. The results indicate that gross pitch contour statistics such as mean, maximum, minimum, and range are more emotionally prominent than features describing the pitch shape. Also, analyzing the pitch statistics at the utterance level is found to be more accurate and robust than analyzing the pitch statistics for shorter speech regions (e.g., voiced segments). Finally, the best features are selected to build a binary emotion detection system for distinguishing between emotional versus neutral speech. A new two-step approach is proposed. In the first step, reference models for the pitch features are trained with neutral speech, and the input features are contrasted with the neutral model. In the second step, a fitness measure is used to assess whether the input speech is similar to, in the case of neutral speech, or different from, in the case of emotional speech, the reference models. The proposed approach is tested with four acted emotional databases spanning different emotional categories, recording settings, speakers and languages. The results show that the recognition accuracy of the system is over 77% just with the pitch features (baseline 50%). When compared to conventional classification schemes, th- - e proposed approach performs better in terms of both accuracy and robustness.