Tracking vocal tract resonances using a quantized nonlinear function embeddedin a temporal constraint

Authors:
Li Deng;A. Acero;I. Bazzi
Affiliations:
Microsoft Res., Redmond, WA, USA;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 3

Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying Dirichlet process mixture models

IEEE Transactions on Audio, Speech, and Language Processing
Combining auditory preprocessing and Bayesian estimation for robust formant tracking

IEEE Transactions on Audio, Speech, and Language Processing
Spectro-temporal analysis of speech signals using zero-time windowing and group delay function

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new technique for high-accuracy tracking of vocal-tract resonances (which coincide with formants for nonnasalized vowels) in natural speech. The technique is based on a discretized nonlinear prediction function, which is embedded in a temporal constraint on the quantized input values over adjacent time frames as the prior knowledge for their temporal behavior. The nonlinear prediction is constructed, based on its analytical form derived in detail in this paper, as a parameter-free, discrete mapping function that approximates the "forward" relationship from the resonance frequencies and bandwidths to the Linear Predictive Coding (LPC) cepstra of real speech. Discretization of the function permits the "inversion" of the function via a search operation. We further introduce the nonlinear-prediction residual, characterized by a multivariate Gaussian vector with trainable mean vectors and covariance matrices, to account for the errors due to the functional approximation. We develop and describe an expectation-maximization (EM)-based algorithm for training the parameters of the residual, and a dynamic programming-based algorithm for resonance tracking. Details of the algorithm implementation for computation speedup are provided. Experimental results are presented which demonstrate the effectiveness of our new paradigm for tracking vocal-tract resonances. In particular, we show the effectiveness of training the prediction-residual parameters in obtaining high-accuracy resonance estimates, especially during consonantal closure.