Automatic discrimination between laughter and speech

Authors:
Khiet P. Truong;David A. van Leeuwen
Affiliations:
TNO Human Factors, Department of Human Interfaces, P.O. Box 23, 3769 ZG Soesterberg, The Netherlands;TNO Human Factors, Department of Human Interfaces, P.O. Box 23, 3769 ZG Soesterberg, The Netherlands
Venue:
Speech Communication
Year:
2007

Citing 8
Cited 19

The nature of statistical learning theory

The nature of statistical learning theory
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective

Speech Communication - Speaker recognition and its commercial and forensic applications
LAFCam: Leveraging affective feedback camcorder

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Emotions, speech and the ASR framework

Speech Communication - Special issue on speech and emotion
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Highlight sound effects detection in audio stream

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
A comparison of features for speech, music discrimination

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Exploiting high-level information provided by ALISP in speaker recognition

NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing

Affective multimodal mirror: sensing and eliciting laughter

Proceedings of the international workshop on Human-centered multimedia
Invited paper: Automatic speech recognition: History, methods and challenges

Pattern Recognition
Fusion of audio and visual cues for laughter detection

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Human-Centred Intelligent Human Computer Interaction (HCI²): how far are we from attaining it?

International Journal of Autonomous and Adaptive Communications Systems
Decision-Level Fusion for Audio-Visual Laughter Detection

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Detection of Laughter-in-Interaction in Multichannel Close-Talk Microphone Recordings of Meetings

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Audiovisual laughter detection based on temporal features

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Social signal processing: state-of-the-art and future perspectives of an emerging domain

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Social signal processing: Survey of an emerging domain

Image and Vision Computing
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Implicit human-centered tagging

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Sentic avatar: multimodal affective conversational agent with common sense

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Spoken emotion recognition using hierarchical classifiers

Computer Speech and Language
Supervised and semi-supervised infant-directed speech classification for parent-infant interaction analysis

Speech Communication
Detecting laughter in spontaneous speech by constructing laughter bouts

International Journal of Speech Technology
Prosodic and temporal features for language modeling for dialog

Speech Communication
Spotting laughter in natural multiparty conversations: A comparison of automatic online and offline approaches using audiovisual data

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
The MAHNOB Laughter database

Image and Vision Computing
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker's state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognition. Different types of features (spectral, prosodic) for laughter detection were investigated using different classification techniques (Gaussian Mixture Models, Support Vector Machines, Multi Layer Perceptron) often used in language and speaker recognition. Classification experiments were carried out with short pre-segmented speech and laughter segments extracted from the ICSI Meeting Recorder Corpus (with a mean duration of approximately 2s). Equal error rates of around 3% were obtained when tested on speaker-independent speech data. We found that a fusion between classifiers based on Gaussian Mixture Models and classifiers based on Support Vector Machines increases discriminative power. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between laughter and speech. Our acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughter and speech.