Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
Hi-index | 0.00 |
We present a new algorithm for adjusting the magnitude spectrum when the fundamental frequency (F/sub 0/) of a speech signal is altered. The algorithm exploits the correlation between F/sub 0/ and the magnitude spectrum of speech as represented by line spectral frequencies (LSFs). This correlation is class-dependent, and thus a broad classification of the input is achieved by a Gaussian mixture model (GMM). The within-class dependencies of LSFs on F/sub 0/ values are captured by constructing their joint probability densities using a series of GMMs, one for each speech class. The proposed system is used for post-processing the pitch modified signal. Perceptual tests showed that the addition of this post-processing system improves the naturalness of the pitch modified signal for large pitch modification factors.