Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities

Authors:
Stavros Petridis;Hatice Gunes;Sebastian Kaltwang;Maja Pantic
Affiliations:
Imperial College London, London, United Kingdom;Imperial College London, London, United Kingdom;University of Karlsruhe, Karlsruhe, Germany;Imperial College London, London, United Kingdom
Venue:
Proceedings of the 2009 international conference on Multimodal interfaces
Year:
2009

Citing 11
Cited 4

Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Facial expression recognition from video sequences: temporal and static modeling

Computer Vision and Image Understanding - Special issue on Face recognition
How to distinguish posed from spontaneous smiles using geometric features

Proceedings of the 9th international conference on Multimodal interfaces
Fusion of audio and visual cues for laughter detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Audiovisual laughter detection based on temporal features

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic temporal segment detection and affect recognition from face and body display

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on human computing
Audio-visual based emotion recognition-a new approach

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Particle filtering with factorized likelihoods for tracking facial features

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry

IEEE Transactions on Information Forensics and Security - Part 1
Audio–Visual Affective Expression Recognition Through Multistream Fused HMM

IEEE Transactions on Multimedia

Implicit image tagging via facial information

Proceedings of the 2nd international workshop on Social signal processing
Recognizing emotions from video in a continuous 2D space

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part IV
Output-associative RVM regression for dimensional and continuous emotion prediction

Image and Vision Computing
Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Human nonverbal behavior recognition from multiple cues and modalities has attracted a lot of interest in recent years. Despite the interest, many research questions, including the type of feature representation, choice of static vs. dynamic classification schemes, the number and type of cues or modalities to use, and the optimal way of fusing these, remain open research questions. This paper compares frame-based vs window-based feature representation and employs static vs. dynamic classification schemes for two distinct problems in the field of automatic human nonverbal behavior analysis: multicue discrimination between posed and spontaneous smiles from facial expressions, head and shoulder movements, and audio-visual discrimination between laughter and speech. Single cue and single modality results are compared to multicue and multimodal results by employing Neural Networks, Hidden Markov Models (HMMs), and 2- and 3-chain coupled HMMs. Subject independent experimental evaluation shows that: 1) both for static and dynamic classification, fusing data coming from multiple cues and modalities proves useful to the overall task of recognition, 2) the type of feature representation appears to have a direct impact on the classification performance, and 3) static classification is comparable to dynamic classification both for multicue discrimination between posed and spontaneous smiles, and audio-visual discrimination between laughter and speech.