Medium-term speaker states-A review on intoxication, sleepiness and the first challenge

Authors:
Björn Schuller;Stefan Steidl;Anton Batliner;Florian Schiel;Jarek Krajewski;Felix Weninger;Florian Eyben
Affiliations:
Technische Universität München, Institute for Human-Machine Communication, Germany and Joanneum Research Forschungsgesellschaft mbH, DIGITAL - Institute for Information and Communication ...;ICSI, Berkeley, CA, USA and FAU Erlangen-Nuremberg, Pattern Recognition Lab, Germany;Technische Universität München, Institute for Human-Machine Communication, Germany and FAU Erlangen-Nuremberg, Pattern Recognition Lab, Germany;Bavarian Archive for Speech Signals, Ludwig-Maximilians-Universität München, Germany;University of Würzburg, Industrial and Organizational Psychology, Germany;Technische Universität München, Institute for Human-Machine Communication, Germany;Technische Universität München, Institute for Human-Machine Communication, Germany
Venue:
Computer Speech and Language
Year:
2014

Citing 10
Cited 0

Speech during sustained operations

Speech Communication - Special issue on speech under stress
The DCIEM Map Task Corpus: spontaneous dialogue under sleep deprivation and drug treatment

Speech Communication - Special issue on speech under stress
Statistical Analysis: A Computer Oriented Approach

Statistical Analysis: A Computer Oriented Approach
Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy

Machine Learning
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Segmenting into adequate units for automatic recognition of emotion-related episodes: a speech-based approach

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the emerging field of computational paralinguistics, most research efforts are devoted to either short-term speaker states such as emotions, or long-term traits such as personality, gender, or age. To bridge this gap on the time axis, and hence broaden the scope of the field, the INTERSPEECH 2011 Speaker State Challenge addressed the algorithmic analysis of medium-term speaker states: alcohol intoxication and sleepiness, both of which are highly relevant in high risk environments. Preserving the paradigms of the two previous INTERSPEECH Challenges, researchers were invited to participate in a large-scale evaluation providing unified testing conditions. This article reviews previous efforts to automatically recognise intoxication and sleepiness from speech signals, and gives an overview on the Challenge conditions and data sets, the methods used by the participants, and their results. By fusing participants' systems, we show that binary classification of alcoholisation and sleepiness from short-term observations, i.e., single utterances, can both reach over 72% accuracy on unseen test data; furthermore, we demonstrate that these medium-term states can be recognised more robustly by fusing short-term classifiers along the time axis, reaching up to 91% accuracy for intoxication and 75% for sleepiness.