Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
Dynamic Assignment of Gaussian Components in Modelling Speech Spectra
Journal of VLSI Signal Processing Systems
Rao-Blackwellized particle filter for multiple target tracking
Information Fusion
Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing
Computer Speech and Language
Discrete-time speech signal processing: principles and practice
Discrete-time speech signal processing: principles and practice
Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
A global, boundary-centric framework for unit selection text-to-speech synthesis
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Robust formant tracking for continuous speech with speaker variability
IEEE Transactions on Audio, Speech, and Language Processing
Initialization, training, and context-dependency in HMM-based formant tracking
IEEE Transactions on Audio, Speech, and Language Processing
A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by "uncontrollable modes" of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao-Blackwellized particle filter.