Coding, Analysis, Interpretation, and Recognition of Facial Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Recognizing Facial Expressions in Image Sequences Using Local Parameterized Models of Image Motion
International Journal of Computer Vision
Emotions, speech and the ASR framework
Speech Communication - Special issue on speech and emotion
Subtly Different Facial Expression Recognition and Expression Intensity Estimation
CVPR '98 Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Automated Facial Expression Recognition Based on FACS Action Units
FG '98 Proceedings of the 3rd. International Conference on Face & Gesture Recognition
The eNTERFACE'05 Audio-Visual Emotion Database
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Emotion detection in task-oriented spoken dialogues
ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Behavior recognition via sparse spatio-temporal features
ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
ICETET '08 Proceedings of the 2008 First International Conference on Emerging Trends in Engineering and Technology
Multimodal biometric human recognition for perceptual human-computer interaction
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Emotion recognition using a hierarchical binary decision tree approach
Speech Communication
Relevance feedback for real-world human action retrieval
Pattern Recognition Letters
Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Multimedia
Hi-index | 0.00 |
In this paper, we present human emotion recognition systems based on audio and spatio-temporal visual features. The proposed system has been tested on audio visual emotion data set with different subjects for both genders. The mel-frequency cepstral coefficient (MFCC) and prosodic features are first identified and then extracted from emotional speech. For facial expressions spatio-temporal features are extracted from visual streams. Principal component analysis (PCA) is applied for dimensionality reduction of the visual features and capturing 97 % of variances. Codebook is constructed for both audio and visual features using Euclidean space. Then occurrences of the histograms are employed as input to the state-of-the-art SVM classifier to realize the judgment of each classifier. Moreover, the judgments from each classifier are combined using Bayes sum rule (BSR) as a final decision step. The proposed system is tested on public data set to recognize the human emotions. Experimental results and simulations proved that using visual features only yields on average 74.15 % accuracy, while using audio features only gives recognition average accuracy of 67.39 %. Whereas by combining both audio and visual features, the overall system accuracy has been significantly improved up to 80.27 %.