Detecting changing emotions in human speech by machine and humans

Authors:
C. Natalie Wal;Wojtek Kowalczyk
Affiliations:
Department of Artificial Intelligence, VU University Amsterdam, HV Amsterdam, The Netherlands 1081;Leiden Institute of Advanced Computer Science, Leiden University, CA Leiden, The Netherlands 2333
Venue:
Applied Intelligence
Year:
2013

Citing 9
Cited 0

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Random Forests

Machine Learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Automatic Recognition of Emotions from Speech: A Review of the Literature and Recommendations for Practical Realisation

Affect and Emotion in Human-Computer Interaction
Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech

Affect and Emotion in Human-Computer Interaction
Towards affective sensing

HCI'07 Proceedings of the 12th international conference on Human-computer interaction: intelligent multimodal interaction environments
Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

Computer Speech and Language
Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation

Computer Speech and Language
An extraction of emotion in human speech using cluster analysis and a regression tree

ACS'10 Proceedings of the 10th WSEAS international conference on Applied computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goals of this research were: (1) to develop a system that will automatically measure changes in the emotional state of a speaker by analyzing his/her voice, (2) to validate this system with a controlled experiment and (3) to visualize the results to the speaker in 2-d space. Natural (non-acted) human speech of 77 (Dutch) speakers was collected and manually divided into meaningful speech units. Three recordings per speaker were collected, in which he/she was in a positive, neutral and negative state. For each recording, the speakers rated 16 emotional states on a 10-point Likert Scale. The Random Forest algorithm was applied to 207 speech features that were extracted from recordings to qualify (classification) and quantify (regression) the changes in speaker's emotional state. Results showed that predicting the direction of change of emotions and predicting the change of intensity, measured by Mean Squared Error, can be done better than the baseline (the most frequent class label and the mean value of change, respectively). Moreover, it turned out that changes in negative emotions are more predictable than changes in positive emotions. A controlled experiment investigated the difference in human and machine performance on judging the emotional states in one's own voice and that of another. Results showed that humans performed worse than the algorithm in the detection and regression problems. Humans, just like the machine algorithm, were better in detecting changing negative emotions rather than positive ones. Finally, results of applying the Principal Component Analysis (PCA) to our data provided a validation of dimensional emotion theories and they suggest that PCA is a promising technique for visualizing user's emotional state in the envisioned application.