Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying Dirichlet process mixture models

Authors:
Emre Özkan;I. Yücel Özbek;Mübeccel Demirekler
Affiliations:
Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey;Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey;Department of Electrical and Electronics Engineering, Middle East Technical University, Ankara, Turkey
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 14
Cited 0

Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Dynamic Assignment of Gaussian Components in Modelling Speech Spectra

Journal of VLSI Signal Processing Systems
Rao-Blackwellized particle filter for multiple target tracking

Information Fusion
Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing

Computer Speech and Language
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Analysis and Synthesis of Formant Spaces of British, Australian, and American Accents

IEEE Transactions on Audio, Speech, and Language Processing
Reliable methods for estimating relative vocal tract lengths from formant trajectories of common words

IEEE Transactions on Audio, Speech, and Language Processing
Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model

IEEE Transactions on Audio, Speech, and Language Processing
A global, boundary-centric framework for unit selection text-to-speech synthesis

IEEE Transactions on Audio, Speech, and Language Processing
Tracking vocal tract resonances using a quantized nonlinear function embeddedin a temporal constraint

IEEE Transactions on Audio, Speech, and Language Processing
Robust formant tracking for continuous speech with speaker variability

IEEE Transactions on Audio, Speech, and Language Processing
Initialization, training, and context-dependency in HMM-based formant tracking

IEEE Transactions on Audio, Speech, and Language Processing
A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech

IEEE Transactions on Audio, Speech, and Language Processing
Cascade Prediction Filters With Adaptive Zeros to Track the Time-Varying Resonances of the Vocal Tract

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a new approach for dynamic speech spectrum representation and tracking vocal tract resonance (VTR) frequencies. The method involves representing the spectral density of the speech signals as a mixture of Gaussians with unknown number of components for which time-varying Dirichlet process mixture model (DPM) is utilized. In the resulting representation, the number of formants is allowed to vary in time. The paper first presents an analysis on the continuity of the formants in the spectrum during the speech utterance. The analysis is based on a new state space representation of concatenated tube model. We show that the number of formants which appear in the spectrum is directly related to the location of the constriction of the vocal tract (i.e., the location of the excitation). Moreover, the disappearance of the formants in the spectrum is explained by "uncontrollable modes" of the state space model. Under the assumption of existence of varying number of formants in the spectrum, we propose the use of a DPM model based multi-target tracking algorithm for tracking unknown number of formants. The tracking algorithm defines a hierarchical Bayesian model for the unknown formant states and the inference is done via Rao-Blackwellized particle filter.