Pitch synchronous transform warping in voice conversion

Authors:
Robert Vích;Martin Vondra
Affiliations:
Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Prague 8, Czech Republic;Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic, Prague 8, Czech Republic
Venue:
COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Year:
2011

Citing 7
Cited 0

Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
High-resolution voice transformation

High-resolution voice transformation
Non-linear frequency scale mapping for voice conversion in text-to-speech system with cepstral description

Speech Communication
Speech modeling using the complex cepstrum

Proceedings of the Third COST 2102 international training school conference on Toward autonomous, adaptive, and context-aware multimodal interfaces: theoretical and practical issues
Speech emotion modification using a cepstral vocoder

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Speech identity conversion

Nonlinear Speech Modeling and Applications
Modification of the glottal voice characteristics based on changing the maximum-phase speech component

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a new voice conversion algorithm is presented, which transforms the utterance of a source speaker into the utterance of a target speaker. The voice conversion approach is based on pitch synchronous speech analysis, Discrete Cosine Transform (DCT), nonlinear spectral warping with spectrum interpolation and pitch synchronous speech synthesis with overlapping using the speech production model. The DCT speech model contains also information about the phase properties of the modeled speech frame, but is, in contrary to a model based e.g. on the discrete Fourier transform, a real model and can be efficiently used for speech coding and voice conversion. The resulting finite impulse response of the converted DCT speech model is obtained by the inverse DCT and it is of the mixed phase type. The proposed voice conversion procedure results in speech with high naturalness.