Combining auditory preprocessing and Bayesian estimation for robust formant tracking

Authors:
Claudius Gläser;Martin Heckmann;Frank Joublin;Christian Goerick
Affiliations:
Honda Research Institute Europe, Germany;Honda Research Institute Europe, Germany;Honda Research Institute Europe, Germany;Honda Research Institute Europe, Germany
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 12
Cited 2

Speech recognition by machines and humans

Speech Communication
Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Probabilistic robotics

Communications of the ACM - Robots: intelligence, versatility, adaptivity
Maintaining Multi-Modality through Mixture Tracking

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing

Computer Speech and Language
Researching and developing a real-time infrastructure for intelligent systems - Evolution of an integrated approach

Robotics and Autonomous Systems
Bayesian Filtering for Location Estimation

IEEE Pervasive Computing
Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model

IEEE Transactions on Audio, Speech, and Language Processing
Tracking vocal tract resonances using a quantized nonlinear function embeddedin a temporal constraint

IEEE Transactions on Audio, Speech, and Language Processing
Robust formant tracking for continuous speech with speaker variability

IEEE Transactions on Audio, Speech, and Language Processing
Initialization, training, and context-dependency in HMM-based formant tracking

IEEE Transactions on Audio, Speech, and Language Processing
Temporal codes and computations for sensory representation and scene analysis

IEEE Transactions on Neural Networks

A hierarchical framework for spectro-temporal feature extraction

Speech Communication
Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using Difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-Formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios.