Fusion of audio and visual cues for laughter detection

Authors:
Stavros Petridis;Maja Pantic
Affiliations:
Imperial College, London, United Kngdm;Imperial College, London, United Kngdm
Venue:
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Year:
2008

Citing 12
Cited 5

The vocabulary problem in human-system communication

Communications of the ACM
Active shape models—their training and application

Computer Vision and Image Understanding
LAFCam: Leveraging affective feedback camcorder

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Smile and Laughter Recognition using Speech Processing and Face Recognition from Conversation Video

CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Highlight sound effects detection in audio stream

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Human computing and machine understanding of human behavior: a survey

Proceedings of the 8th international conference on Multimodal interfaces
Automatic discrimination between laughter and speech

Speech Communication
How to distinguish posed from spontaneous smiles using geometric features

Proceedings of the 9th international conference on Multimodal interfaces
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Particle filtering with factorized likelihoods for tracking facial features

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry

IEEE Transactions on Information Forensics and Security - Part 1
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Decision-Level Fusion for Audio-Visual Laughter Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Audiovisual laughter detection based on temporal features

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
A3: HCI Coding Guideline for Research Using Video Annotation to Assess Behavior of Nonverbal Subjects with Computer-Based Intervention

ACM Transactions on Accessible Computing (TACCESS)
Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities

Proceedings of the 2009 international conference on Multimodal interfaces
Is this joke really funny? judging the mirth by audiovisual laughter analysis

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo

Quantified Score

Hi-index	0.00

Visualization

Abstract

Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio-visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Each channel consists of 2 streams (cues), facial expressions and head movements for video and spectral and prosodic features for audio. We used decision level fusion to integrate the information from the two channels and experimented using the SUM rule and a neural network as the integration functions. The results indicate that even a simple linear function such as the SUM rule achieves very good performance in audiovisual fusion. We also experimented with different combinations of cues with the most informative being the facial expressions and the spectral features. The best combination of cues is the integration of facial expressions, spectral and prosodic features when a neural network is used as the fusion method. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves over 90% recall rate and over 80% precision.