Emotion Recognition in Chinese Natural Speech by Combining Prosody and Voice Quality Features

Authors:
Shiqing Zhang
Affiliations:
School of Physics and Electronic Engineering, Taizhou University, Taizhou, China 318000
Venue:
ISNN '08 Proceedings of the 5th international symposium on Neural Networks: Advances in Neural Networks, Part II
Year:
2008

Citing 4
Cited 4

The nature of statistical learning theory

The nature of statistical learning theory
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
The role of voice quality in communicating emotion, mood and attitude

Speech Communication - Special issue on speech and emotion

Emotion recognition from speech: a review

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Emotion recognition from speech using global and local prosodic features

International Journal of Speech Technology
Dimensionality reduction-based spoken emotion recognition

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, recognition of human emotion is a challenging yet important speech technology. In this paper, based on deriving prosody features from emotional speech, some voice quality features are proposed to be extracted as new emotional features to improve emotion recognition. Utilizing support vector machines classifier, four emotions from Chinese natural emotional speech corpus including anger, joy, sadness and neutral are discriminated by combining prosody and voice quality features. The experiment results show that combining prosody and voice quality features yields an overall accuracy of 76% for emotion recognition, which makes approximately 10% improvement compared with using the single prosody features. It also shows that voice quality features in speech are effective emotional features and can promote prosody features for improving emotion recognition results.