A HMM-WDLT framework for HNM-based voice conversion with parametric adjustment in formant bandwidth, duration and excitation

  • Authors:
  • Hwai-Tsu Hu;Chu Yu

  • Affiliations:
  • Department of Electronic Engineering, National I-Lan University, I-Lan, Taiwan, ROC;Department of Electronic Engineering, National I-Lan University, I-Lan, Taiwan, ROC

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a framework, named Hidden Markov Model--Weighted Deviation Linear Transformation (HMM-WDLT), for performing voice conversion based on the Harmonic + Noise Model (HNM). The HMM-WDLT achieves the lowest average spectral distortion in a comparative study of spectral conversion. The problem with broader formant bandwidths can be remedied by a weighting constraint and ordering check with the minimum clearance estimated from the HMM-WDLT. By jointly exploiting the dynamic time warping (DTW) and the HMM-WDLT, the conversion in duration is also feasible. Moreover, the HMM-WDLT plays a part in the conversion of excitation-related parameters such as the fundamental frequency, maximum voiced frequency, and harmonic magnitudes for critical bands below 2.7 kHz. The ability of modifying the pitch and duration concurrently allows the HMM-WDLT to carry out the prosody conversion. Listening tests reveal that the converted speech successfully catches the speaker's individuality with satisfactory quality.