Transition-based speech synthesis using neural networks

  • Authors:
  • G. Corrigan;N. Massey;O. Schnurr

  • Affiliations:
  • Motorola Labs., Schaumburg, IL, USA;-;-

  • Venue:
  • ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prior attempts to use neural networks to synthesize speech from a phonetic representation have used the neural network to generate a frame of input to a vocoder. As this requires the neural network to compute one output for each frame of speech from the vocoder, this can be computationally expensive. An alternative implementation is to model the speech as a series of gestures, and let the neural network generate parameters describing the transitions of the vocoder parameters during these gestures. Experiments have shown that acceptable speech quality is produced when each gesture is half of a phonetic segment and the transition model is a set of cubic polynomials describing the variation of each vocoder parameter during the gesture. This results in a significant reduction in computational cost.