Voice conversion based on weighted frequency warping

Authors:
Daniel Erro;Asunción Moreno;Antonio Bonafonte
Affiliations:
AhoLab Signal Processing Laboratory, University of the Basque Country, Bilbao, Spain and TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain;TALP Research Center, Universitat Politècnica de Catalunya, Barcelona, Spain
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 11
Cited 3

Voice transformation using PSOLA technique

Speech Communication - Eurospeech '91
Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
Voice Characteristics Conversion for HMM-based Speech Synthesis System

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 3 - Volume 3
High-resolution voice transformation

High-resolution voice transformation
Speech synthesis using HMMs with dynamic features

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Review: Statistical parametric speech synthesis

Speech Communication
Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm

IEEE Transactions on Audio, Speech, and Language Processing
Quality-enhanced voice morphing using maximum likelihood transformations

IEEE Transactions on Audio, Speech, and Language Processing
Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

IEEE Transactions on Audio, Speech, and Language Processing

Voice conversion using linear prediction coefficients and artificial neural network

Proceedings of the CUBE International Information Technology Conference
Approaching speech intelligibility enhancement with inspiration from Lombard and Clear speaking styles

Computer Speech and Language
Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

Any modification applied to speech signals has an impact on their perceptual quality. In particular, voice conversion to modify a source voice so that it is perceived as a specific target voice involves prosodic and spectral transformations that produce significant quality degradation. Choosing among the current voice conversion methods represents a trade-off between the similarity of the converted voice to the target voice and the quality of the resulting converted speech, both rated by listeners. This paper presents a new voice conversion method termed Weighted Frequency Warping that has a good balance between similarity and quality. This method uses a time-varying piecewise-linear frequency warping function and an energy correction filter, and it combines typical probabilistic techniques and frequency warping transformations. Compared to standard probabilistic systems, Weighted Frequency Warping results in a significant increase in quality scores, whereas the conversion scores remain almost unaltered. This paper carefully discusses the theoretical aspects of the method and the details of its implementation, and the results of an international evaluation of the new system are also included.