A novel emotion recognizer from speech using both prosodic and linguistic features

Authors:
Motoyuki Suzuki;Seiji Tsuchiya;Fuji Ren
Affiliations:
Institute of Technology and Science, The University of Tokushima, Tokushima, Japan;Department of Intelligent Information Engineering and Sciences, Doshisha University, Kyotanabe, Kyoto, Japan;Institute of Technology and Science, The University of Tokushima, Tokushima, Japan
Venue:
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part I
Year:
2011

Citing 4
Cited 0

Constructing a Sensuous Judgment System Based on Conceptual Processing

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Primitives-based evaluation and estimation of emotions in speech

Speech Communication
Emotion Judgment Based on Relationship between Speaker and Sentential Actor

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part I
Emotion judgment method from an utterance sentence

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part IV

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emotion recognition based on speech characteristics generally relies on prosodic information. However, utterances with different emotions in speech have similar prosodic features, so it is difficult to recognize emotion by using only prosodic features. In this paper, we propose a novel approach to emotion recognition that considers both prosodic and linguistic features. First, possible emotions are output by clustering-based emotion recognizer, which only uses prosodic features. Then, subtitles given by the speech recognizer are input for another emotion recognizer based on the "Association Mechanism." It outputs a possible emotion by using only linguistic information. Lastly, the intersection of the two sets of possible emotions is integrated into the final result. Experimental results showed that the proposed method achieved higher performance than either prosodic- or linguistic-based emotion recognition. In a comparison with manually labeled data, the F-measure was 32.6%. On the other hand, the average of F-measures of labeled data given by other humans was 42.9%. This means that the proposed method performed at 75.9% in relation to human ability.