Automated assessment of prosody production

Authors:
Jan P. H. van Santen;Emily Tucker Prud'hommeaux;Lois M. Black
Affiliations:
Center for Spoken Language Understanding (CSLU), Division of Biomedical Computer Science (BMCS), School of Medicine, Oregon Health and Science University, 20000 NW Walker Road, Beaverton, OR 97006 ...;Center for Spoken Language Understanding (CSLU), Division of Biomedical Computer Science (BMCS), School of Medicine, Oregon Health and Science University, 20000 NW Walker Road, Beaverton, OR 97006 ...;Center for Spoken Language Understanding (CSLU), Division of Biomedical Computer Science (BMCS), School of Medicine, Oregon Health and Science University, 20000 NW Walker Road, Beaverton, OR 97006 ...
Venue:
Speech Communication
Year:
2009

Citing 3
Cited 0

Contextual effects on vowel duration

Speech Communication
Prosodic cues for rated politeness in Japanese speech

Speech Communication
Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion

Quantified Score

Hi-index	0.05

Visualization

Abstract

Assessment of prosody is important for diagnosis and remediation of speech and language disorders, for diagnosis of neurological conditions, and for foreign language instruction. Current assessment is largely auditory-perceptual, which has obvious drawbacks; however, automation of assessment faces numerous obstacles. We propose methods for automatically assessing production of lexical stress, focus, phrasing, pragmatic style, and vocal affect. Speech was analyzed from children in six tasks designed to elicit specific prosodic contrasts. The methods involve dynamic and global features, using spectral, fundamental frequency, and temporal information. The automatically computed scores were validated against mean scores from judges who, in all but one task, listened to ''prosodic minimal pairs'' of recordings, each pair containing two utterances from the same child with approximately the same phonemic material but differing on a specific prosodic dimension, such as stress. The judges identified the prosodic categories of the two utterances and rated the strength of their contrast. For almost all tasks, we found that the automated scores correlated with the mean scores approximately as well as the judges' individual scores. Real-time scores assigned during examination - as is fairly typical in speech assessment - correlated substantially less than the automated scores with the mean scores.