Interrelation Between Speech and Facial Gestures in Emotional Utterances: A Single Subject Study

Authors:
C. Busso;S. S. Narayanan
Affiliations:
Integrated Media Syst. Center, Univ. of Southern California, Los Angeles, CA;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 8

Emotion recognition from facial expressions and its control using fuzzy logic

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A nonparametric regression model for virtual humans generation

Multimedia Tools and Applications
Multimodal information fusion application to human emotion recognition from face and speech

Multimedia Tools and Applications
BOO: Behavior-oriented ontology to describe participant dynamics in collocated design meetings

Expert Systems with Applications: An International Journal
Multimodal biometric human recognition for perceptual human-computer interaction

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Towards multimodal sentiment analysis: harvesting opinions from the web

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Towards sensing the influence of visual narratives on human affect

Proceedings of the 14th ACM international conference on Multimodal interaction
Emotion-aware assistive system for humanistic care based on the orange computing concept

Applied Computational Intelligence and Soft Computing - Special issue on Awareness Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The verbal and nonverbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message. The present paper investigates the influence of articulation and emotions on the interrelation between facial gestures and speech. The analyses are based on an audio-visual database recorded from an actress with markers attached to her face, who was asked to read semantically neutral sentences, expressing four emotion states (neutral, sadness, happiness, and anger). A multilinear regression framework is used to estimate facial features from acoustic speech parameters. The levels of coupling between the communication channels are quantified by using Pearson's correlation between the recorded and estimated facial features. The results show that facial and acoustic features are strongly interrelated, showing levels of correlation higher than r = 0.8 when the mapping is computed at sentence-level using spectral envelope speech features. The results reveal that the lower face region provides the highest activeness and correlation levels. Furthermore, the correlation levels present significant interemo- tional differences, which suggest that emotional content affect the relationship between facial gestures and speech. Principal component analysis (PCA) shows that the audiovisual mapping parameters are grouped in a smaller subspace, which suggests that there is an emotion-dependent structure that is preserved from across sentences. The results suggest that this internal structure seems to be easy to model when prosodic-features are used to estimate the audiovisual mapping. The results also reveal that the correlation levels within a sentence vary according to broad phonetic properties presented in the sentence. Consonants, especially unvoiced and fricative sounds, present the lowest correlation lev- els. Likewise, the results show that facial gestures are linked at different resolutions. While the orofacial area is locally connected with the speech, other facial gestures such as eyebrow motion are linked only at the sentence-level. The results presented here have important implications for applications such as facial animation and multimodal emotion recognition.