Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory

Authors:
T. Toda;A. W. Black;K. Tokuda
Affiliations:
Grad. Sch. of Inf. Sci., Nara Inst. of Sci. & Technol., Nara;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 17

Review: Statistical parametric speech synthesis

Speech Communication
Silent-speech enhancement using body-conducted vocal-tract resonance signals

Speech Communication
Voice conversion using partial least squares regression

IEEE Transactions on Audio, Speech, and Language Processing
Voice conversion based on weighted frequency warping

IEEE Transactions on Audio, Speech, and Language Processing
Supervisory data alignment for text-independent voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Spectral mapping using artificial neural networks for voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Synthesis of child speech with HMM adaptation and voice conversion

IEEE Transactions on Audio, Speech, and Language Processing
Hierarchical prosody conversion using regression-based clustering for emotional speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency

Speech Communication
Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech

Speech Communication
A voice conversion method using segmental GMMs and automatic GMM selection

ROCLING '11 ROCLING 2011 Poster Papers
Automatic stress exaggeration by prosody modification to assist language learners perceive sentence stress

International Journal of Speech Technology
Voice conversion using linear prediction coefficients and artificial neural network

Proceedings of the CUBE International Information Technology Conference
Tracking continuous emotional trends of participants during affective dyadic interactions using body language and speech information

Image and Vision Computing
Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data

Speech Communication
Alaryngeal Speech Enhancement Based on One-to-Many Eigenvoice Conversion

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Synthesis of Spontaneous Speech With Syllable Contraction Using State-Based Context-Dependent Voice Transformation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we describe a novel spectral conversion method for voice conversion (VC). A Gaussian mixture model (GMM) of the joint probability density of source and target features is employed for performing spectral conversion between speakers. The conventional method converts spectral parameters frame by frame based on the minimum mean square error. Although it is reasonably effective, the deterioration of speech quality is caused by some problems: 1) appropriate spectral movements are not always caused by the frame-based conversion process, and 2) the converted spectra are excessively smoothed by statistical modeling. In order to address those problems, we propose a conversion method based on the maximum-likelihood estimation of a spectral parameter trajectory. Not only static but also dynamic feature statistics are used for realizing the appropriate converted spectrum sequence. Moreover, the oversmoothing effect is alleviated by considering a global variance feature of the converted spectra. Experimental results indicate that the performance of VC can be dramatically improved by the proposed method in view of both speech quality and conversion accuracy for speaker individuality.