Very low bit-rate F0 coding for phonetic vocoders using MSD-HMM with quantized F0 symbols

Authors:
Takashi Nose;Takao Kobayashi
Affiliations:
Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan;Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology, Yokohama 226-8502, Japan
Venue:
Speech Communication
Year:
2012

Citing 3
Cited 0

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Hidden Markov models based on multi-space probability distribution for pitch pattern modeling

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis

IEICE - Transactions on Information and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a technique of very low bit-rate F0 coding for phonetic vocoders based on a hidden Markov model (HMM) using phone-level quantized F0 symbols. In the proposed technique, an input F0 sequence is converted into an F0 symbol sequence at the phone level using scalar quantization. The quantized F0 symbols represent the rough shape of the original F0 contour and are used as the prosodic context for the HMM in the decoding process. To model the F0 that has voiced and unvoiced regions, we use multi-space probability distribution HMM (MSD-HMM). Synthetic speech is generated from the context-dependent labels and pre-trained MSD-HMMs by using the HMM-based parameter generation algorithm. By taking into account the preceding and succeeding contexts as well as the current one in the modeling and synthesis, we can generate a smooth F0 trajectory similar to that of the original with only a small number of quantization bits. The experimental results reveal that the proposed F0 coding outperforms the conventional segment-based F0 coding technique using MSD-VQ. We also demonstrate that the decoded speech of the proposed vocoder has acceptable quality even when the F0 bit-rate is less than 50bps.