First steps towards new czech voice conversion system

Authors:
Zdeněk Hanzlíček;Jindřich Matoušek
Affiliations:
Department of Cybernetics, University of West Bohemia, Plzeň, Czech Republic;Department of Cybernetics, University of West Bohemia, Plzeň, Czech Republic
Venue:
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Year:
2006

Citing 2
Cited 1

High-resolution voice transformation

High-resolution voice transformation
Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02

Voice conversion based on probabilistic parameter transformation and extended inter-speaker residual prediction

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we deal with initial experiments on creating a new Czech voice conversion system Voice conversion (VC) is a process which modifies the speech signal produced by one (source) speaker so that it sounds like another (target) speaker Using VC technique a new voice for speech synthesizer can be prepared with no need to record a huge amount of new speech data The transformation is determined using equal sentences from both speakers; these sentences are time-aligned using modified dynamic time warping algorithm The conversion is divided into two stages corresponding to the source-filter model of speech production Within this work we employ conversion function based on Gaussian mixture model for transforming the spectral envelope described by line spectral frequencies Residua are converted using so called residual prediction techniques Unlike in other similar research works, we predict residua not from the transformed spectral envelope, but directly from the source speech Four versions of residual prediction are described and compared in this study Objective evaluation of converted speech using performance metrics shows that our system is comparable with similar existing VC systems.