A three-layered model for expressive speech perception

  • Authors:
  • Chun-Fang Huang;Masato Akagi

  • Affiliations:
  • School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1211, Japan;School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1211, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a multi-layer approach to modeling perception of expressive speech. Many earlier studies of expressive speech focused on statistical correlations between expressive speech and acoustic features without taking into account the fact that human perception is vague rather than precise. This paper introduces a three-layer model: five categories of expressive speech constitute the top layer, semantic primitives constitute the middle layer, and acoustic features, the bottom layer. Three experiments followed by multidimensional scaling analysis revealed suitable semantic primitives. Then, fuzzy inference systems were built to map the vagueness of the relationship between expressive speech and the semantic primitives. Acoustic features in terms of F0 contour, time duration, power envelope, and spectrum were analyzed. Regression analysis revealed correlation between the semantic primitives and the acoustic features. Parameterized rules based on the analysis results were created to morph neutral utterances to those perceived as having different semantic primitives and expressive speech categories. Experiments to verify the relationships of the model showed significant relationships between expressive speech, semantic primitives, and acoustic features.