Automatic extraction of paralinguistic information using prosodic features related to F0, duration and voice quality

  • Authors:
  • Carlos Toshinori Ishi;Hiroshi Ishiguro;Norihiro Hagita

  • Affiliations:
  • ATR Intelligent Robotics and Communication Labs, 2-2 Hikaridai "Keihanna Science City", Kyoto 619-0288, Japan;ATR Intelligent Robotics and Communication Labs, 2-2 Hikaridai "Keihanna Science City", Kyoto 619-0288, Japan;ATR Intelligent Robotics and Communication Labs, 2-2 Hikaridai "Keihanna Science City", Kyoto 619-0288, Japan

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The use of acoustic-prosodic features related to F0, duration and voice quality is proposed and evaluated for automatic extraction of paralinguistic information (intentions, attitudes, and emotions) in dialogue speech. Perceptual experiments and acoustic analyses were conducted for monosyllabic interjections spoken in several speaking styles, conveying a variety of paralinguistic information. Experimental results indicated that the classical prosodic features, i.e., F0 and duration, were effective for discriminating groups of paralinguistic information expressing intentions, such as affirm, deny, filler, and ask for repetition, and accounted for 57% of the global detection rate, in a task of discriminating seven groups of paralinguistic information. On the other hand, voice quality features were effective for identifying part of the paralinguistic information expressing emotions or attitudes, such as surprised, disgusted and admired, leading to a 12% improvement in the global detection rate.