A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation

Authors:
Hwai-Tsu Hu;Chu Yu
Affiliations:
Department of Electronic Engineering, National I-Lan University, I-Lan, Taiwan, ROC;Department of Electronic Engineering, National I-Lan University, I-Lan, Taiwan, ROC
Venue:
International Journal of Speech Technology
Year:
2012

Citing 9
Cited 0

Voice transformation using PSOLA technique

Speech Communication - Eurospeech '91
Fundamentals of speech recognition

Fundamentals of speech recognition
Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Transformation of formants for voice conversion using artificial neural networks

Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)

Speech Communication
On artificial bandwidth extension of telephone speech

Signal Processing - Special section: Hans Wilhelm Schüßler celebrates his 75th birthday
A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Voice Conversion Using HMM combined with GMM

CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 5 - Volume 05
Statistical Approach for Voice Personality Transformation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a framework, named Hidden Markov Model--Weighted Deviation Linear Transformation (HMM-WDLT), for performing voice conversion based on the Harmonic + Noise Model (HNM). The HMM-WDLT achieves the lowest average spectral distortion in a comparative study of spectral conversion. The problem with broader formant bandwidths can be remedied by a weighting constraint and ordering check with the minimum clearance estimated from the HMM-WDLT. By jointly exploiting the dynamic time warping (DTW) and the HMM-WDLT, the conversion in duration is also feasible. Moreover, the HMM-WDLT plays a part in the conversion of excitation-related parameters such as the fundamental frequency, maximum voiced frequency, and harmonic magnitudes for critical bands below 2.7 kHz. The ability of modifying the pitch and duration concurrently allows the HMM-WDLT to carry out the prosody conversion. Listening tests reveal that the converted speech successfully catches the speaker's individuality with satisfactory quality.