Detecting emotional state of a child in a conversational computer game

Authors:
Serdar Yildirim;Shrikanth Narayanan;Alexandros Potamianos
Affiliations:
Computer Engineering Department, Mustafa Kemal University, Antakya 31040, Turkey;Signal Analysis and Interpretation Laboratory (SAIL), Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089, USA;Department of ECE, Technical University of Crete, Chania 73100, Greece
Venue:
Computer Speech and Language
Year:
2011

Citing 9
Cited 11

Elements of information theory

Elements of information theory
Vector-based natural language call routing

Computational Linguistics
Speech technology on trial: Experiences from the August system

Natural Language Engineering
Predicting student emotions in computer-human tutoring dialogues

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Ensemble methods for spoken emotion recognition in call-centres

Speech Communication
Automatic prediction of frustration

International Journal of Human-Computer Studies
Influence of contextual information in emotion annotation for spoken dialogue systems

Speech Communication
Emotion recognition from speech: Putting ASR in the loop

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Spoken emotion recognition using hierarchical classifiers

Computer Speech and Language
Automatically assessing the ABCs: Verification of children's spoken letter-names and letter-sounds

ACM Transactions on Speech and Language Processing (TSLP)
Emotion recognition using a hierarchical binary decision tree approach

Speech Communication
"That's aggravating, very aggravating": is it possible to classify behaviors in couple interactions using automatically derived lexical features?

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Application of nonlinear dynamics characterization to emotional speech

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Fuzzy cognitive maps for artificial emotions forecasting

Applied Soft Computing
Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features

Speech Communication
EmotiWord: affective lexicon creation with application to interaction and multimedia data

MUSCLE'11 Proceedings of the 2011 international conference on Computational Intelligence for Multimedia Understanding
Dimensionality reduction-based spoken emotion recognition

Multimedia Tools and Applications
Nonlinear dynamics characterization of emotional speech

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic recognition of user's communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past few years. For dialog systems, it is important to know not only what was said but also how something was communicated, so that the system can engage the user in a richer and more natural interaction. This paper addresses the problem of automatically detecting ''frustration'', ''politeness'', and ''neutral'' attitudes from a child's speech communication cues, elicited in spontaneous dialog interactions with computer characters. Several information sources such as acoustic, lexical, and contextual features, as well as, their combinations are used for this purpose. The study is based on a Wizard-of-Oz dialog corpus of 103 children, 7-14 years of age, playing a voice activated computer game. Three-way classification experiments, as well as, pairwise classification between polite vs. others and frustrated vs. others were performed. Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection. Furthermore, the fusion of acoustic, lexical and contextual information provided significantly better classification results. Results also showed that classification performance varies with age and gender. Specifically, for the ''politeness'' detection task, higher classification accuracy was achieved for females and 10-11 years-olds, compared to males and other age groups, respectively.