Investigating glottal parameters for differentiating emotional categories with similar prosodics

Authors:
Rui Sun;Elliot Moore;Juan F. Torres
Affiliations:
Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA, 31407, USA;Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA, 31407, USA;Georgia Institute of Technology, School of Electrical and Computer Engineering, 210 Technology Circle, Savannah, GA, 31407, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 4

Survey on speech emotion recognition: Features, classification schemes, and databases

Pattern Recognition
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Investigating acoustic cues in automatic detection of learners' emotion from auto tutor

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech prosodics (i.e., pitch, energy, etc.) play an important role in the interpretation of emotional expression. However, certain pairs of emotions can be difficult to discriminate due to similar displayed tendencies in prosodic statistics. The purpose of this paper is to target speaker dependent expressions of emotional pairs that share statistically similar prosodic information and investigate a set of glottal features for their ability to find measurable differences in these expressions. Evaluation is based on acted emotional utterances from the Emotional Prosody and Speech Transcript (EPST) database. While it is in no way assumed that acted speech provides a complete picture of authentic emotion, the value of this information is that the actors adjusted their voice quality to fit their perception of different emotions. Results show statistically significant differences (p ≪ 0.05) in at least one glottal feature for all 30 emotion pairs where prosodic features did not show a significant difference. In addition, the use of single glottal features reduced classification error for 24 emotion pairs in comparison to pitch or energy.