Voice transformation using PSOLA technique
Speech Communication - Eurospeech '91
Fundamentals of speech recognition
Fundamentals of speech recognition
Speech Communication - Special issue: voice conversion: state of the art and perspectives
Transformation of formants for voice conversion using artificial neural networks
Speech Communication - Special issue: voice conversion: state of the art and perspectives
Speaker transformation algorithm using segmental codebooks (STASC)
Speech Communication
On artificial bandwidth extension of telephone speech
Signal Processing - Special section: Hans Wilhelm Schüßler celebrates his 75th birthday
A 2.4 kbit/s MELP coder candidate for the new U.S. Federal Standard
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Voice Conversion Using HMM combined with GMM
CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 5 - Volume 05
Statistical Approach for Voice Personality Transformation
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper presents a framework, named Hidden Markov Model--Weighted Deviation Linear Transformation (HMM-WDLT), for performing voice conversion based on the Harmonic + Noise Model (HNM). The HMM-WDLT achieves the lowest average spectral distortion in a comparative study of spectral conversion. The problem with broader formant bandwidths can be remedied by a weighting constraint and ordering check with the minimum clearance estimated from the HMM-WDLT. By jointly exploiting the dynamic time warping (DTW) and the HMM-WDLT, the conversion in duration is also feasible. Moreover, the HMM-WDLT plays a part in the conversion of excitation-related parameters such as the fundamental frequency, maximum voiced frequency, and harmonic magnitudes for critical bands below 2.7 kHz. The ability of modifying the pitch and duration concurrently allows the HMM-WDLT to carry out the prosody conversion. Listening tests reveal that the converted speech successfully catches the speaker's individuality with satisfactory quality.