Prosody-preserving voice transformation to evaluate brain representations of speech sounds

  • Authors:
  • Purvis Bedenbaugh;Diana K. Sarko;Heidi L. Roth;Eugene M. Martin

  • Affiliations:
  • Department of Engineering, East Carolina University, Greenville, NC;Department of Biology, Vanderbilt University, Nashville, TN;Department of Neurology, University of North Carolina, Chapel Hill, NC;Laboratory of Neurobiology and Behavior, Rockefeller University, New York, NY

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study employs a voice-transformation to overcome the limitations of brain mapping to study brain representations of natural sounds such as speech. Brain mapping studies of natural sound representations, which present a fixed sound to many neurons with different acoustic frequency selectivity, are difficult to interpret because individual neurons exhibit considerable unexplained variability in the dynamical aspects of their evoked responses. This new approach samples how a single recording responds to an ensemble of sounds, instead of sampling an ensemble of neuronal recordings. A noise excited filter-bank analysis and resynthesis vocoder systematically shifts the frequency band occupied by sounds in the ensemble. The quality of the voice transformation is assessed by evaluating the number of bands the filter bank must have to support emotional prosody identification. Perceptual data show that emotional prosody can be recognized within normal limits if the bandwidth of filter-bank channels is less than or equal to the bandwidth of perceptual auditory filters. Example physiological data show that stationary linear transfer functions cannot fully explain the responses of central auditory neurons to speech sounds, and that deviations from model predictions are not random. They may be related to acoustic or articulatory features of speech.