Modeling drivers' speech under stress

Authors:
Raul Fernandez;Rosalind W. Picard
Affiliations:
MIT Media Laboratory, 20 Ames. Street, Cambridge, MA;MIT Media Laboratory, 20 Ames. Street, Cambridge, MA
Venue:
Speech Communication - Special issue on speech and emotion
Year:
2003

Citing 10
Cited 23

Ten lectures on wavelets

Ten lectures on wavelets
Fundamentals of speech recognition

Fundamentals of speech recognition
Towards a definition and working model of stress and its effects on speech

Speech Communication - Special issue on speech under stress
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Introduction to Bayesian Networks

Introduction to Bayesian Networks
Support Vector Machines: Training and Applications

Support Vector Machines: Training and Applications
The Teager energy based feature parameters for robust speech recognition in car noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Speech under stress conditions: overview of the effect on speech production and on system performance

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Methods for stress classification: nonlinear TEO and linear speech based features

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04

Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
2005 Special Issue: Beyond emotion archetypes: Databases for emotion modelling using neural networks

Neural Networks - Special issue: Emotion and brain
2005 Special Issue: Challenges in real-life emotion annotation and machine learning based detection

Neural Networks - Special issue: Emotion and brain
Investigating emotional interaction with a robotic dog

OZCHI '07 Proceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces
Fear-type emotion recognition for future audio-based surveillance systems

Speech Communication
A Systematic Comparison of Different HMM Designs for Emotion Recognition from Acted and Spontaneous Speech

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Affective Human-Robotic Interaction

Affect and Emotion in Human-Computer Interaction
A tractable hybrid ddn–pomdp approach to affective dialogue modeling for probabilistic frame-based dialogue systems

Natural Language Engineering
Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

IEICE - Transactions on Information and Systems
Class-level spectral features for emotion recognition

Speech Communication
Automatic inference of complex affective states

Computer Speech and Language
Investigation of spectral centroid features for cognitive load classification

Speech Communication
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
Emotional states in judicial courtrooms: An experimental investigation

Speech Communication
The CASIA audio emotion recognition method for audio/visual emotion challenge 2011

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Emotion recognition from speech: a review

International Journal of Speech Technology
How's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones

Proceedings of the 6th International Conference on Body Area Networks
StressSense: detecting stress in unconstrained acoustic environments using smartphones

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Multimodal behavior and interaction as indicators of cognitive load

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Objective measures, sensors and computational techniques for stress recognition and classification: A survey

Computer Methods and Programs in Biomedicine
Dimensionality reduction-based spoken emotion recognition

Multimedia Tools and Applications
Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the use of features derived from multiresolution analysis of speech and the Teager energy operator for classification of drivers' speech under stressed conditions. We apply this set of features to a database of short speech utterances to create user-dependent discriminants of four stress categories. In addition we address the problem of choosing a suitable temporal scale for representing categorical differences in the data. This leads to two modeling approaches. In the first approach, the dynamics of the feature set within the utterance are assumed to be important for the classification task. These features are then classified using dynamic Bayesian network models as well as a model consisting of a mixture of hidden Markov models (M-HMM). In the second approach, we define an utterance-level feature set by taking the mean value of the features across the utterance. This feature set is then modeled with a support vector machine and a multilayer perceptron classifier. We compare the performance on the sparser and full dynamic representations against a chance-level performance of 25% and obtain the best performance with the speaker-dependent mixture model (96.4% on the training set, and 61.2% on a separate testing set). We also investigate how these models perform on the speaker-independent task. Although the performance of the speaker-independent models degrades with, respect to the models trained on individual speakers, the mixture model still outperforms the competing models and achieves significantly better than random recognition (80.4% on the training set, and 51.2% on a separate testing set).