Perception of stress and speaking style for selected elements of the SUSAS database

Authors:
Robert S. Bolia;Raymond E. Slyh
Affiliations:
Air Force Research Laboratory (AFRL/HECP), 2255 H Street, Wright-Patterson Air Force Base, OH;Air Force Research Laboratory (AFRL/HECP), 2255 H Street, Wright-Patterson Air Force Base, OH
Venue:
Speech Communication
Year:
2003

Citing 4
Cited 1

Generating stressed speech from neutral speech using a modified CELP vocoder

Speech Communication - Special issue on speech under stress
Classification of speech under stress using target driven features

Speech Communication - Special issue on speech under stress
Analysis and compensation of stressed and noisy speech with application to robust automatic recognition

Analysis and compensation of stressed and noisy speech with application to robust automatic recognition
Analysis of mrate, shimmer, jitter, and F/sub 0/ contour features across stress and speaking style in the SUSAS database

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04

Feature selection based on mutual correlation

CIARP'06 Proceedings of the 11th Iberoamerican conference on Progress in Pattern Recognition, Image Analysis and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Speech Under Simulated and Actual Stress (SUSAS) database is a collection of utterances recorded under conditions of simulated or actual stress, the purpose of which is to allow researchers to study the effects of stress and speaking style on the speech waveform. The aim of the present investigation was to assess the perceptual validity of the simulated portion of the database by determining the extent to which listeners classify its utterances according to their assigned labels. Seven listeners performed an eight-alternative, forced-choice response, judging whether monosyllabic or disyllabic words spoken by talkers from three different regional accent classes (Boston, General American, New York) were best classified as, clear, fast, loud, neutral, question, slow, or soft. Mean percentages of "correct" judgments were analysed using a 3 (regional accent class) × 2 (number of syllables) × 8 (speaking style) repeated measures analysis of variance. Results indicate that, overall, listeners correctly classify the utterances only 58% of the time, and that the percentage of correct classifications varies as a function of all three independent variables.