Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction

Authors:
A. Kain;M. W. Macon
Affiliations:
Center for Spoken Language Understanding, Oregon Graduate Inst., Beaverton, OR, USA;-
Venue:
ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Year:
2001

Citing 0
Cited 14

Voice morphing using 3D waveform interpolation surfaces and lossless tube area functions

EURASIP Journal on Applied Signal Processing
Quality enhancement of compressed audio based on statistical conversion

EURASIP Journal on Audio, Speech, and Music Processing - Scalable Audio-Content Analysis
Multimodal Human Machine Interactions in Virtual and Augmented Reality

Multimodal Signals: Cognitive and Algorithmic Issues
Transformation procedure for paternal and pathology voices

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Voice disguise and automatic detection: review and perspectives

Progress in nonlinear speech processing
Voice conversion based on probabilistic parameter transformation and extended inter-speaker residual prediction

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Spectral mapping using artificial neural networks for voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Developing objective measures of foreign-accent conversion

IEEE Transactions on Audio, Speech, and Language Processing
Voice conversion based on weighted least squares estimation criterion and residual prediction from pitch contour

ACII'05 Proceedings of the First international conference on Affective Computing and Intelligent Interaction
First steps towards new czech voice conversion system

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Data driven approaches to speech and language processing

Nonlinear Speech Modeling and Applications
Comparing ANN and GMM in a voice conversion framework

Applied Soft Computing
Voice conversion using linear prediction coefficients and artificial neural network

Proceedings of the CUBE International Information Technology Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of a voice conversion (VC) system is to change the perceived speaker identity of a speech signal. We propose an algorithm based on converting the LPC spectrum and predicting the residual as a function of the target envelope parameters. We conduct listening tests based on speaker discrimination of same/difference pairs to measure the accuracy by which the converted voices match the desired target voices. To establish the level of human performance as a baseline, we first measure the ability of listeners to discriminate between original speech utterances under three conditions: normal, fundamental frequency and duration normalized, and LPC coded. Additionally, the spectral parameter conversion function is tested in isolation by listening to source, target, and converted speakers as LPC coded speech. The results show that the speaker identity of speech whose LPC spectrum has been converted can be recognized as the target speaker with the same level of performance as discriminating between LPC coded speech. However, the level of discrimination of converted utterances produced by the full VC system is significantly below that of speaker discrimination of natural speech.