Recognizing child's emotional state in problem-solving child-machine interactions

Authors:
Serdar Yildirim;Shrikanth Narayanan
Affiliations:
Mustafa Kemal University, Hatay, Turkey;University of Southern California, Los Angeles
Venue:
Proceedings of the 2nd Workshop on Child, Computer and Interaction
Year:
2009

Citing 8
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
Performance of optical flow techniques

International Journal of Computer Vision
The nature of statistical learning theory

The nature of statistical learning theory
Support Vector Machines

IEEE Intelligent Systems
Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice

ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
Automatic detection of disfluency boundaries in spontaneous speech of children using audio-visual information

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The need for automatic recognition of a speaker's emotion within a spoken dialog system framework has received increased attention with demand for computer interfaces that provide natural and user-adaptive spoken interaction. This paper addresses the problem of automatically recognizing a child's emotional state using information obtained from audio and video signals. The study is based on a multimodal data corpus consisting of spontaneous conversations between a child and a computer agent. Four different techniques-- k-nearest neighborhood (k-NN) classifier, decision tree, linear discriminant classifier (LDC), and support vector machine classifier (SVC)-- were employed for classifying utterances into 2 emotion classes, negative and non-negative, for both acoustic and visual information. Experimental results show that, overall, combining visual information with acoustic information leads to performance improvements in emotion recognition. We obtained the best results when information sources were combined at feature level. Specifically, results showed that the addition of visual information to acoustic information yields relative improvements in emotion recognition of 3.8% with both LDC and SVC classifiers for information fusion at the feature level over that of using only acoustic information.