A three-layered model for expressive speech perception

Authors:
Chun-Fang Huang;Masato Akagi
Affiliations:
School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1211, Japan;School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1211, Japan
Venue:
Speech Communication
Year:
2008

Citing 12
Cited 0

Implementation and testing of a system for producing emotion-by-rule in synthetic speech

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Industrial Applications of Fuzzy Control

Industrial Applications of Fuzzy Control
Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models

Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models
Describing the emotional states that are expressed in speech

Speech Communication - Special issue on speech and emotion
Emotional speech: towards a new generation of databases

Speech Communication - Special issue on speech and emotion
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
Indirect acquisition of instrumental gesture based on signal, physical and perceptual information

NIME '03 Proceedings of the 2003 conference on New interfaces for musical expression
2005 Special Issue: Challenges in real-life emotion annotation and machine learning based detection

Neural Networks - Special issue: Emotion and brain
Emotion detection in task-oriented spoken dialogues

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03
Toward a rule-based synthesis of emotional speech on linguistic descriptions of perception

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a multi-layer approach to modeling perception of expressive speech. Many earlier studies of expressive speech focused on statistical correlations between expressive speech and acoustic features without taking into account the fact that human perception is vague rather than precise. This paper introduces a three-layer model: five categories of expressive speech constitute the top layer, semantic primitives constitute the middle layer, and acoustic features, the bottom layer. Three experiments followed by multidimensional scaling analysis revealed suitable semantic primitives. Then, fuzzy inference systems were built to map the vagueness of the relationship between expressive speech and the semantic primitives. Acoustic features in terms of F0 contour, time duration, power envelope, and spectrum were analyzed. Regression analysis revealed correlation between the semantic primitives and the acoustic features. Parameterized rules based on the analysis results were created to morph neutral utterances to those perceived as having different semantic primitives and expressive speech categories. Experiments to verify the relationships of the model showed significant relationships between expressive speech, semantic primitives, and acoustic features.