Automatic recognition of speech emotion using long-term spectro-temporal features

Authors:
Siqing Wu;Tiago H. Falk;Wai-Yip Chan
Affiliations:
Department of Electrical and Computer Engineering, Queen's University, Kingston, Ontario, Canada;Department of Electrical and Computer Engineering, Queen's University, Kingston, Ontario, Canada;Department of Electrical and Computer Engineering, Queen's University, Kingston, Ontario, Canada
Venue:
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Year:
2009

Citing 3
Cited 6

Emotional speech: towards a new generation of databases

Speech Communication - Special issue on speech and emotion
An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Speech Communication
Primitives-based evaluation and estimation of emotions in speech

Speech Communication

Role of modulation magnitude and phase spectrum towards speech intelligibility

Speech Communication
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Speech Communication
Application of nonlinear dynamics characterization to emotional speech

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Emotion recognition from speech: a review

International Journal of Speech Technology
Nonlinear dynamics characterization of emotional speech

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel feature type for the recognition of emotion from speech. The features are derived from a long-term spectro-temporal representation of speech. They are compared to short-term spectral features as well as popular prosodic features. Experimental results with the Berlin emotional speech database show that the proposed features outperform both types of compared features. An average recognition accuracy of 88.6% is achieved by using a combined proposed & prosodic feature set for classifying 7 discrete emotions. Moreover, the proposed features are evaluated on the VAM corpus to recognize continuous emotion primitives. Estimation performance comparable to human evaluations is furnished.