The vocabulary problem in human-system communication
Communications of the ACM
Active shape models—their training and application
Computer Vision and Image Understanding
LAFCam: Leveraging affective feedback camcorder
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Smile and Laughter Recognition using Speech Processing and Face Recognition from Conversation Video
CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Highlight sound effects detection in audio stream
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Human computing and machine understanding of human behavior: a survey
Proceedings of the 8th international conference on Multimodal interfaces
Automatic discrimination between laughter and speech
Speech Communication
How to distinguish posed from spontaneous smiles using geometric features
Proceedings of the 9th international conference on Multimodal interfaces
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Particle filtering with factorized likelihoods for tracking facial features
FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry
IEEE Transactions on Information Forensics and Security - Part 1
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Decision-Level Fusion for Audio-Visual Laughter Detection
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Audiovisual laughter detection based on temporal features
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
ACM Transactions on Accessible Computing (TACCESS)
Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities
Proceedings of the 2009 international conference on Multimodal interfaces
Is this joke really funny? judging the mirth by audiovisual laughter analysis
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Hi-index | 0.00 |
Past research on automatic laughter detection has focused mainly on audio-based detection. Here we present an audio-visual approach to distinguishing laughter from speech and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Each channel consists of 2 streams (cues), facial expressions and head movements for video and spectral and prosodic features for audio. We used decision level fusion to integrate the information from the two channels and experimented using the SUM rule and a neural network as the integration functions. The results indicate that even a simple linear function such as the SUM rule achieves very good performance in audiovisual fusion. We also experimented with different combinations of cues with the most informative being the facial expressions and the spectral features. The best combination of cues is the integration of facial expressions, spectral and prosodic features when a neural network is used as the fusion method. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves over 90% recall rate and over 80% precision.