Static and Dynamic Modelling for the Recognition of Non-verbal Vocalisations in Conversational Speech

Authors:
Björn Schuller;Florian Eyben;Gerhard Rigoll
Affiliations:
Institute for Human-Machine Communication, Technische Universität München, München, Germany 80333;Institute for Human-Machine Communication, Technische Universität München, München, Germany 80333;Institute for Human-Machine Communication, Technische Universität München, München, Germany 80333
Venue:
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Year:
2008

Citing 5
Cited 6

Prosody in Speech Understanding Systems

Prosody in Speech Understanding Systems
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Understanding spontaneous speech: the Phoenix system

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Audiovisual recognition of spontaneous interest within conversations

Proceedings of the 9th international conference on Multimodal interfaces
On the use of nonverbal speech sounds in human communication

COST 2102'07 Proceedings of the 2007 COST action 2102 international conference on Verbal and nonverbal communication behaviours

Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement

EURASIP Journal on Audio, Speech, and Music Processing
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Detecting laughter in spontaneous speech by constructing laughter bouts

International Journal of Speech Technology
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
The MAHNOB Laughter database

Image and Vision Computing
Towards the automatic detection of spontaneous agreement and disagreement based on nonverbal behaviour: A survey of related cues, databases, and tools

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Non-verbal vocalisations such as laughter, breathing, hesitation, and consent play an important role in the recognition and understanding of human conversational speech and spontaneous affect. In this contribution we discuss two different strategies for robust discrimination of such events: dynamic modelling by a broad selection of diverse acoustic Low-Level-Descriptors vs. static modelling by projection of these via statistical functionals onto a 0.6k feature space with subsequent de-correlation. As classifiers we employ Hidden Markov Models, Conditional Random Fields, and Support Vector Machines, respectively. For discussion of extensive parameter optimisation test-runs with respect to features and model topology, 2.9k non-verbals are extracted from the spontaneous Audio-Visual Interest Corpus. 80.7% accuracy can be reported with, and 92.6% without a garbage model for the discrimination of the named classes.