Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing

Authors:
Qin Yan;Saeed Vaseghi;Esfandiar Zavarehei;Ben Milner;Jonathan Darch;Paul White;Ioannis Andrianakis
Affiliations:
Institute of Acoustics, Chinese Academy of Science, Beijing, China;Department of Electronic and Computer Engineering, Brunel University, UB8 3PH, UK;Department of Electronic and Computer Engineering, Brunel University, UB8 3PH, UK;School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK;School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK;Institute of Sound and Vibration Research, University Road, Highfield, Southampton S017 1BJ, UK;Institute of Sound and Vibration Research, University Road, Highfield, Southampton S017 1BJ, UK
Venue:
Computer Speech and Language
Year:
2007

Citing 4
Cited 5

Computer speech: recognition, compression, synthesis

Computer speech: recognition, compression, synthesis
Advanced Digital Signal Processing and Noise Reduction

Advanced Digital Signal Processing and Noise Reduction
State based sub-band LP Wiener filters for speech enhancement in car environments

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
Filtering of colored noise for speech enhancement and coding

IEEE Transactions on Signal Processing

Cross-entropic comparison of formants of British, Australian and American English accents

Speech Communication
Dynamic speech spectrum representation and tracking variable number of vocal tract resonance frequencies with time-varying Dirichlet process mixture models

IEEE Transactions on Audio, Speech, and Language Processing
Combining auditory preprocessing and Bayesian estimation for robust formant tracking

IEEE Transactions on Audio, Speech, and Language Processing
Improved noise minimum statistics estimation algorithm for using in a speech-passing noise-rejecting headset

EURASIP Journal on Advances in Signal Processing - Special issue on robust processing of nonstationary signals
Acoustic transformations to improve the intelligibility of dysarthric speech

SLPAT '11 Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a formant tracking linear prediction (LP) model for speech processing in noise. The main focus of this work is on the utilization of the correlation of the energy contours of speech, along the formant tracks, for improved formant and LP model estimation in noise. The approach proposed in this paper provides a systematic framework for modelling and utilization of the inter-frame correlation of speech parameters across successive speech frames; the within frame correlations are modelled by the LP parameters. The formant tracking LP model estimation is composed of three stages: (1) a pre-cleaning spectral amplitude estimation stage where an initial estimate of the LP model of speech for each frame is obtained, (2) a formant classification and estimation stage using probability models of formants and Viterbi-decoders and (3) an inter-frame formant de-noising and smoothing stage where Kalman filters are used to model the formant trajectories and reduce the effect of residue noise on formants. The adverse effects of car and train noise on estimates of formant tracks and LP models are investigated. The evaluation results for the estimation of the formant tracking LP model demonstrate that the proposed combination of the initial noise reduction stage with formant tracking and Kalman smoothing stages, results in a significant reduction in errors and distortions.