The nature of statistical learning theory
The nature of statistical learning theory
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective
Speech Communication - Speaker recognition and its commercial and forensic applications
LAFCam: Leveraging affective feedback camcorder
CHI '02 Extended Abstracts on Human Factors in Computing Systems
Emotions, speech and the ASR framework
Speech Communication - Special issue on speech and emotion
SVMTorch: support vector machines for large-scale regression problems
The Journal of Machine Learning Research
Highlight sound effects detection in audio stream
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A comparison of features for speech, music discrimination
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Exploiting high-level information provided by ALISP in speaker recognition
NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing
Affective multimodal mirror: sensing and eliciting laughter
Proceedings of the international workshop on Human-centered multimedia
Invited paper: Automatic speech recognition: History, methods and challenges
Pattern Recognition
Fusion of audio and visual cues for laughter detection
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Human-Centred Intelligent Human Computer Interaction (HCI²): how far are we from attaining it?
International Journal of Autonomous and Adaptive Communications Systems
Decision-Level Fusion for Audio-Visual Laughter Detection
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings
MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Audiovisual laughter detection based on temporal features
ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Social signal processing: state-of-the-art and future perspectives of an emerging domain
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Social signal processing: Survey of an emerging domain
Image and Vision Computing
Automatic nonverbal analysis of social interaction in small groups: A review
Image and Vision Computing
Implicit human-centered tagging
ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Sentic avatar: multimodal affective conversational agent with common sense
Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Spoken emotion recognition using hierarchical classifiers
Computer Speech and Language
Detecting laughter in spontaneous speech by constructing laughter bouts
International Journal of Speech Technology
Prosodic and temporal features for language modeling for dialog
Speech Communication
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Image and Vision Computing
Hi-index | 0.00 |
Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker's state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech.