C4.5: programs for machine learning
C4.5: programs for machine learning
Performance of optical flow techniques
International Journal of Computer Vision
The nature of statistical learning theory
The nature of statistical learning theory
IEEE Intelligent Systems
Multimodal model integration for sentence unit detection
Proceedings of the 6th international conference on Multimodal interfaces
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice
ISM '08 Proceedings of the 2008 Tenth IEEE International Symposium on Multimedia
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
The need for automatic recognition of a speaker's emotion within a spoken dialog system framework has received increased attention with demand for computer interfaces that provide natural and user-adaptive spoken interaction. This paper addresses the problem of automatically recognizing a child's emotional state using information obtained from audio and video signals. The study is based on a multimodal data corpus consisting of spontaneous conversations between a child and a computer agent. Four different techniques-- k-nearest neighborhood (k-NN) classifier, decision tree, linear discriminant classifier (LDC), and support vector machine classifier (SVC)-- were employed for classifying utterances into 2 emotion classes, negative and non-negative, for both acoustic and visual information. Experimental results show that, overall, combining visual information with acoustic information leads to performance improvements in emotion recognition. We obtained the best results when information sources were combined at feature level. Specifically, results showed that the addition of visual information to acoustic information yields relative improvements in emotion recognition of 3.8% with both LDC and SVC classifiers for information fusion at the feature level over that of using only acoustic information.