First steps towards new czech voice conversion system

  • Authors:
  • Zdeněk Hanzlíček;Jindřich Matoušek

  • Affiliations:
  • Department of Cybernetics, University of West Bohemia, Plzeň, Czech Republic;Department of Cybernetics, University of West Bohemia, Plzeň, Czech Republic

  • Venue:
  • TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we deal with initial experiments on creating a new Czech voice conversion system Voice conversion (VC) is a process which modifies the speech signal produced by one (source) speaker so that it sounds like another (target) speaker Using VC technique a new voice for speech synthesizer can be prepared with no need to record a huge amount of new speech data The transformation is determined using equal sentences from both speakers; these sentences are time-aligned using modified dynamic time warping algorithm The conversion is divided into two stages corresponding to the source-filter model of speech production Within this work we employ conversion function based on Gaussian mixture model for transforming the spectral envelope described by line spectral frequencies Residua are converted using so called residual prediction techniques Unlike in other similar research works, we predict residua not from the transformed spectral envelope, but directly from the source speech Four versions of residual prediction are described and compared in this study Objective evaluation of converted speech using performance metrics shows that our system is comparable with similar existing VC systems.