Applying multiple classifiers and non-linear dynamics features for detecting sleepiness from speech

Authors:
Jarek Krajewski;Sebastian Schnieder;David Sommer;Anton Batliner;BjöRn Schuller
Affiliations:
Experimental Industrial Psychology, University of Wuppertal, Gauístraíe 20, 42097 Wuppertal, Germany;Experimental Industrial Psychology, University of Wuppertal, Gauístraíe 20, 42097 Wuppertal, Germany;Neuro Computer Science and Signal Processing, University of Applied Sciences Schmalkalden, Germany;Pattern Recognition, Friedrich-Alexander University Erlangen-Nuremberg, Germany;Institute for Human-Machine Communication, Technische Universität München, Germany
Venue:
Neurocomputing
Year:
2012

Citing 14
Cited 3

Bagging predictors

Machine Learning
Speech during sustained operations

Speech Communication - Special issue on speech under stress
Practical method for determining the minimum embedding dimension of a scalar time series

Physica D
Improved Accuracy in the Singularity Spectrum of Multifractal Chaotic Time Series

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
Automatic Feature Extraction for Classifying Audio Data

Machine Learning
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Fractal aspects of speech signals: dimension and interpolation

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Synthesis and coding of continuous speech with the nonlinear oscillator model

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Bagging, Boosting and Dagging for Classification Problems

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
A finite element model of fluid flow in the vocal tract

Computer Speech and Language
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Energy separation in signal modulations with application to speechanalysis

IEEE Transactions on Signal Processing

Medium-term speaker states-A review on intoxication, sleepiness and the first challenge

Computer Speech and Language
Analysis of voice features related to obstructive sleep apnoea and their application in diagnosis support

Computer Speech and Language
Vocal fatigue induced by prolonged oral reading: Analysis and detection

Computer Speech and Language

Quantified Score

Hi-index	0.01

Visualization

Abstract

Comparing different novel feature sets and classifiers for speech processing based fatigue detection is the primary aim of this study. Thus, we conducted a within-subject partial sleep deprivation design (20.00-04.00h, N=77 participants) and recorded 372 speech samples of sustained vowel phonation. The self-report on the Karolinska Sleepiness Scale (KSS) and an observer report on the KSS, the KSS Observer Scale were applied to determine sleepiness reference values. Feature extraction methods of non-linear dynamics (NLD) provide additional information regarding the dynamics and structure of sleepiness speech. In all, 395 NLD features and the 170 phonetic features, which have been computed partially, represent so far unknown auditive-perceptual concepts. Several NLD and phonetic features show significant correlations to KSS ratings, e.g., from the NLD features for male speakers the skewness of vector length within reconstructed phase space (r=.56), and for female speaker the mean of Cao's minimum embedding dimensions (r=-.39). After a correlation-filter feature subset selection different classification models and ensemble classifiers (by AdaBoost, Bagging) were trained. Bagging procedures turned out to achieve best performance for male and female speakers on the phonetic and the NLD feature set. The best models for the phonetic feature set achieved 78.3% (NaiveBayes) for male and 68.5% (Bagging Bayes Net) for female speaker classification accuracy in detecting sleepiness. The best model for the NLD feature set achieved 77.2% (Bagging Bayes Net) for male and 76.8% (Bagging Bayes Net) for female speakers. Nevertheless, employing the combined phonetic and NLD feature sets provided additional information and thus resulted in an improved highest UA of 79.6% for male (Bayes Net) and 77.1% for female (AdaBoost Nearest Neighbor) speakers.