Combining acoustic features for improved emotion recognition in mandarin speech

Authors:
Tsang-Long Pao;Yu-Te Chen;Jun-Heng Yeh;Wen-Yuan Liao
Affiliations:
Department of Computer Science and Engineering, Tatung University;Department of Computer Science and Engineering, Tatung University;Department of Computer Science and Engineering, Tatung University;Department of Computer Science and Engineering, Tatung University
Venue:
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Year:
2005

Citing 2
Cited 6

Fundamentals of speech recognition

Fundamentals of speech recognition
Recognizing emotions for the audio-visual document indexing

ISCC '04 Proceedings of the Ninth International Symposium on Computers and Communications 2004 Volume 2 (ISCC"04) - Volume 02

Comparison of Classification Methods for Detecting Emotion from Mandarin Speech

IEICE - Transactions on Information and Systems
Survey on speech emotion recognition: Features, classification schemes, and databases

Pattern Recognition
Emotion recognition from speech: a review

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Comparison of complementary spectral features of emotional speech for german, czech, and slovak

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Dimensionality reduction-based spoken emotion recognition

Multimedia Tools and Applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

Combining different feature streams to obtain a more accurate experimental result is a well-known technique. The basic argument is that if the recognition errors of systems using the individual streams occur at different points, there is at least a chance that a combined system will be able to correct some of these errors by reference to the other streams. In the emotional speech recognition system, there are many ways in which this general principle can be applied. In this paper, we proposed using feature selection and feature combination to improve the speaker-dependent emotion recognition in Mandarin speech. Five basic emotions are investigated including anger, boredom, happiness, neutral and sadness. Combining multiple feature streams is clearly highly beneficial in our system. The best accuracy recognizing five different emotions can be achieved 99.44% by using MFCC, LPCC, RastaPLP, LFPC feature streams and the nearest class mean classifier.