The vocabulary problem in human-system communication
Communications of the ACM
Active shape models—their training and application
Computer Vision and Image Understanding
LAFCam: Leveraging affective feedback camcorder
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Smile and Laughter Recognition using Speech Processing and Face Recognition from Conversation Video
CW '05 Proceedings of the 2005 International Conference on Cyberworlds
Highlight sound effects detection in audio stream
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Human computing and machine understanding of human behavior: a survey
Proceedings of the 8th international conference on Multimodal interfaces
Automatic discrimination between laughter and speech
Speech Communication
How to distinguish posed from spontaneous smiles using geometric features
Proceedings of the 9th international conference on Multimodal interfaces
A survey of affect recognition methods: audio, visual and spontaneous expressions
Proceedings of the 9th international conference on Multimodal interfaces
Fusion of audio and visual cues for laughter detection
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
HCI'07 Proceedings of the 2007 IEEE international conference on Human-computer interaction
Particle filtering with factorized likelihoods for tracking facial features
FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
Toward Pose-Invariant 2-D Face Recognition Through Point Distribution Models and Facial Symmetry
IEEE Transactions on Information Forensics and Security - Part 1
Social signal processing: Survey of an emerging domain
Image and Vision Computing
Automatic nonverbal analysis of social interaction in small groups: A review
Image and Vision Computing
Implicit emotional tagging of multimedia using EEG signals and brain computer interface
WSM '09 Proceedings of the first SIGMM workshop on Social media
Static vs. dynamic modeling of human nonverbal behavior from multiple cues and modalities
Proceedings of the 2009 international conference on Multimodal interfaces
Proceedings of the 2009 international conference on Multimodal interfaces
Implicit human-centered tagging
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Is this joke really funny? judging the mirth by audiovisual laughter analysis
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Proceedings of the international conference on Multimedia information retrieval
IEEE Transactions on Information Technology in Biomedicine - Special section on new and emerging technologies in bioinformatics and bioengineering
Facial expression recognition using spatiotemporal boosted discriminatory classifiers
ICIAR'10 Proceedings of the 7th international conference on Image Analysis and Recognition - Volume Part I
Image and Vision Computing
Hi-index | 0.00 |
Previous research on automatic laughter detection has mainly been focused on audio-based detection. In this study we present an audio-visual approach to distinguishing laughter from speech based on temporal features and we show that integrating the information from audio and video channels leads to improved performance over single-modal approaches. Static features are extracted on an audio/video frame basis and then combined with temporal features extracted over a temporal window, describing the evolution of static features over time. The use of several different temporal features has been investigated and it has been shown that the addition of temporal information results in an improved performance over utilizing static information only. It is common to use a fixed set of temporal features which implies that all static features will exhibit the same behaviour over a temporal window. However, this does not always hold and we show that when AdaBoost is used as a feature selector, different temporal features for each static feature are selected, i.e., the temporal evolution of each static feature is described by different statistical measures. When tested on 96 audiovisual sequences, depicting spontaneously displayed (as opposed to posed) laughter and speech episodes, in a person independent way the proposed audiovisual approach achieves an F1 rate of over 89%.