Fundamentals of speech recognition
Fundamentals of speech recognition
The nature of statistical learning theory
The nature of statistical learning theory
Emotional speech: towards a new generation of databases
Speech Communication - Special issue on speech and emotion
Vocal communication of emotion: a review of research paradigms
Speech Communication - Special issue on speech and emotion
Bi-modal emotion recognition from expressive face and body gestures
Journal of Network and Computer Applications
Joint acoustic and modulation frequency
EURASIP Journal on Applied Signal Processing
Primitives-based evaluation and estimation of emotions in speech
Speech Communication
Fear-type emotion recognition for future audio-based surveillance systems
Speech Communication
Investigating glottal parameters for differentiating emotional categories with similar prosodics
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
A dimensional approach to emotion recognition of speech from movies
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Automatic recognition of speech emotion using long-term spectro-temporal features
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Modulation spectral features for robust far-field speaker identification
IEEE Transactions on Audio, Speech, and Language Processing
Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
IEEE Transactions on Audio, Speech, and Language Processing
Application of nonlinear dynamics characterization to emotional speech
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Paralinguistics in speech and language-State-of-the-art and the challenge
Computer Speech and Language
Ubiquitous emotion-aware computing
Personal and Ubiquitous Computing
Cross-validation of bimodal health-related stress assessment
Personal and Ubiquitous Computing
Ten recent trends in computational paralinguistics
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
International Journal of Speech Technology
Continuous emotion recognition with phonetic syllables
Speech Communication
Nonlinear dynamics characterization of emotional speech
Neurocomputing
Hi-index | 0.00 |
In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation frequency components, thereby conveying information that is important for human speech perception but missing from conventional short-term spectral features. On an experiment assessing classification of discrete emotion categories, the MSFs show promising performance in comparison with features that are based on mel-frequency cepstral coefficients and perceptual linear prediction coefficients, two commonly used short-term spectral representations. The MSFs further render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition. Using both types of features, an overall recognition rate of 91.6% is obtained for classifying seven emotion categories. Moreover, in an experiment assessing recognition of continuous emotions, the proposed features in combination with prosodic features attain estimation performance comparable to human evaluation.