Continuous emotion recognition with phonetic syllables

Authors:
A. Origlia;F. Cutugno;V. Galatí
Affiliations:
-;-;-
Venue:
Speech Communication
Year:
2014

Citing 12
Cited 0

2005 Special Issue: Emotion recognition in human-computer interaction

Neural Networks - Special issue: Emotion and brain
Primitives-based evaluation and estimation of emotions in speech

Speech Communication
Extraction and representation of prosodic features for language and speaker recognition

Speech Communication
Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Continuous Prediction of Spontaneous Affect from Multiple Cues and Modalities in Valence-Arousal Space

IEEE Transactions on Affective Computing
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Recognizing affect from speech prosody using hierarchical graphical models

Speech Communication
Vowels formants analysis allows straightforward detection of high arousal emotions

ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Emotional Audio-Visual Speech Synthesis Based on PAD

IEEE Transactions on Audio, Speech, and Language Processing
A dynamic tonal perception model for optimal pitch stylization

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

As research on the extraction of acoustic properties of speech for emotion recognition progresses, the need of investigating methods of feature extraction taking into account the necessities of real time processing systems becomes more important. Past works have shown the importance of syllables for the transmission of emotions, while classical research methods adopted in prosody show that it is important to concentrate on specific areas of the speech signal to study intonation phenomena. Technological approaches, however, are often designed to use the whole speech signal without taking into account the qualitative variability of the spectral content. Given this contrast with the theoretical basis around which prosodic research is pursued, we present here a feature extraction method built on the basis of a phonetic interpretation of the concept of syllable. In particular, we concentrate on the spectral content of syllabic nuclei, thus reducing the amount of information to be processed. Moreover, we introduce feature weighting based on syllabic prominence, thus not considering all the units of analysis as being equally important. The method is evaluated on a continuous, three-dimensional model of emotions built on the classical axes of Valence, Activation and Dominance and is shown to be competitive with state-of-the-art performance. The potential impact of this approach on the design of affective computing systems is also analysed.