Nonlinear dynamic modeling of the voiced excitation for improved speech synthesis

  • Authors:
  • K. Narasimhan;J. C. Principe;D. G. Childers

  • Affiliations:
  • Comput. Neuroeng. Lab., Florida Univ., Gainesville, FL, USA;-;-

  • Venue:
  • ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the implementation of a waveform-based global dynamic model with the goal of capturing vocal folds variability. The residue extracted from speech by inverse filtering is pre-processed to remove phoneme dependence and is used as the input time series to the dynamic model. After training, the dynamic model is seeded with a point from the trajectory of the time series, and iterated to produce the synthetic excitation waveform. The output of the dynamic model is compared with the input time series. These comparisons confirmed that the dynamic model had captured the variability in the residue. The output of the dynamic models is used to synthesize speech using a pitch-synchronous speech synthesizer, and the output is observed to be close to natural speech.