Describing the emotional states that are expressed in speech
Speech Communication - Special issue on speech and emotion
Emotional speech: towards a new generation of databases
Speech Communication - Special issue on speech and emotion
Japanese dialogue corpus of multi-level annotation
SIGDIAL '00 Proceedings of the 1st SIGdial workshop on Discourse and dialogue - Volume 10
Facial Expression Generation from Speaker's Emotional States in Daily Conversation
IEICE - Transactions on Information and Systems
Paralinguistics in speech and language-State-of-the-art and the challenge
Computer Speech and Language
Hi-index | 0.00 |
The Utsunomiya University (UU) Spoken Dialogue Database for Paralinguistic Information Studies is introduced. The UU Database is especially intended for use in understanding the usage, structure and effect of paralinguistic information in expressive Japanese conversational speech. Paralinguistic information refers to meaningful information, such as emotion or attitude, delivered along with linguistic messages. The UU Database comes with labels of perceived emotional states for all utterances. The emotional states were annotated with six abstract dimensions: pleasant-unpleasant, aroused-sleepy, dominant-submissive, credible-doubtful, interested-indifferent, and positive-negative. To stimulate expressively-rich and vivid conversation, the ''4-frame cartoon sorting task'' was devised. In this task, four cards each containing one frame extracted from a cartoon are shuffled, and each participant with two cards out of the four then has to estimate the original order. The effectiveness of the method was supported by a broad distribution of subjective emotional state ratings. Preliminary annotation experiments by a large number of annotators confirmed that most annotators could provide fairly consistent ratings for a repeated identical stimulus, and the inter-rater agreement was good (W~0.5) for three of the six dimensions. Based on the results, three annotators were selected for labeling all 4840 utterances. The high degree of agreement was verified using such measures as Kendall's W. The results of correlation analyses showed that not only prosodic parameters such as intensity and f"0 but also a voice quality parameter were related to the dimensions. Multiple correlation of above 0.7 and RMS error of about 0.6 were obtained for the recognition of some dimensions using linear combinations of the speech parameters. Overall, the perceived emotional states of speakers can be accurately estimated from the speech parameters in most cases.