Speech recognition by machines and humans
Speech Communication
Communications of the ACM - Robots: intelligence, versatility, adaptivity
Maintaining Multi-Modality through Mixture Tracking
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing
Computer Speech and Language
Robotics and Autonomous Systems
Bayesian Filtering for Location Estimation
IEEE Pervasive Computing
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Robust formant tracking for continuous speech with speaker variability
IEEE Transactions on Audio, Speech, and Language Processing
Initialization, training, and context-dependency in HMM-based formant tracking
IEEE Transactions on Audio, Speech, and Language Processing
Temporal codes and computations for sensory representation and scene analysis
IEEE Transactions on Neural Networks
A hierarchical framework for spectro-temporal feature extraction
Speech Communication
International Journal of Speech Technology
Hi-index | 0.00 |
We present a framework for estimating formant trajectories. Its focus is to achieve high robustness in noisy environments. Our approach combines a preprocessing based on functional principles of the human auditory system and a probabilistic tracking scheme. For enhancing the formant structure in spectrograms we use a Gammatone filterbank, a spectral preemphasis, as well as a spectral filtering using Difference-of-Gaussians (DoG) operators. Finally, a contrast enhancement mimicking a competition between filter responses is applied. The probabilistic tracking scheme adopts the mixture modeling technique for estimating the joint distribution of formants. In conjunction with an algorithm for adaptive frequency range segmentation as well as Bayesian smoothing an efficient framework for estimating formant trajectories is derived. Comprehensive evaluations of our method on the VTR-Formant database emphasize its high precision and robustness. We obtained superior performance compared to existing approaches for clean as well as echoic noisy speech. Finally, an implementation of the framework within the scope of an online system using instantaneous feature-based resynthesis demonstrates its applicability to real-world scenarios.