Audiovisual laughter detection based on temporal features

Authors:
Stavros Petridis;Maja Pantic
Affiliations:
Imperial College, London, United Kingdom;Imperial College, London, United Kingdom
Venue:
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Year:
2008

Citing 13
Cited 11

The vocabulary problem in human-system communication

Communications of the ACM
Active shape models—their training and application

Computer Vision and Image Understanding
LAFCam: Leveraging affective feedback camcorder

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Smile and Laughter Recognition using Speech Processing and Face Recognition from Conversation Video

CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Highlight sound effects detection in audio stream

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Human computing and machine understanding of human behavior: a survey

Proceedings of the 8th international conference on Multimodal interfaces
Automatic discrimination between laughter and speech

Speech Communication
How to distinguish posed from spontaneous smiles using geometric features

Proceedings of the 9th international conference on Multimodal interfaces
A survey of affect recognition methods: audio, visual and spontaneous expressions

Proceedings of the 9th international conference on Multimodal interfaces
Fusion of audio and visual cues for laughter detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Combined support vector machines and hidden Markov models for modeling facial action temporal dynamics

HCI'07 Proceedings of the 2007 IEEE international conference on Human-computer interaction
Particle filtering with factorized likelihoods for tracking facial features

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry

IEEE Transactions on Information Forensics and Security - Part 1

Social signal processing: Survey of an emerging domain

Image and Vision Computing
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Implicit emotional tagging of multimedia using EEG signals and brain computer interface

WSM '09 Proceedings of the first SIGMM workshop on Social media
Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities

Proceedings of the 2009 international conference on Multimodal interfaces
Recognizing communicative facial expressions for discovering interpersonal emotions in group meetings

Proceedings of the 2009 international conference on Multimodal interfaces
Implicit human-centered tagging

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Is this joke really funny? judging the mirth by audiovisual laughter analysis

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
The participation payoff: challenges and opportunities for multimedia access in networked communities

Proceedings of the international conference on Multimedia information retrieval
Toward emotion aware computing: an integrated approach using multichannel neurophysiological recordings and affective visual stimuli

IEEE Transactions on Information Technology in Biomedicine - Special section on new and emerging technologies in bioinformatics and bioengineering
Facial expression recognition using spatiotemporal boosted discriminatory classifiers

ICIAR'10 Proceedings of the 7th international conference on Image Analysis and Recognition - Volume Part I
The MAHNOB Laughter database

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audio-visual approach to distinguishing laughter from speech based on temporal features and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Static features are extracted on an audio/video frame basis and then combined with temporal features extracted over a temporal window, describing the evolution of static features over time. The use of several different temporal features has been investigated and it has been shown that the addition of temporal information results in an improved performance over utilizing static information only. It is common to use a fixed set of temporal features which implies that all static features will exhibit the same behaviour over a temporal window. However, this does not always hold and we show that when AdaBoost is used as a feature selector, different temporal features for each static feature are selected, i.e., the temporal evolution of each static feature is described by different statistical measures. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves an F1 rate of over 89%.