A noise robust feature extraction algorithm using joint wavelet packet subband decomposition and AR modeling of speech signals

Authors:
Bojan Kotnik;Zdravko Kačič
Affiliations:
Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ul. 17, SI 2000 Maribor, Slovenia;Faculty of Electrical Engineering and Computer Science, University of Maribor, Smetanova ul. 17, SI 2000 Maribor, Slovenia
Venue:
Signal Processing
Year:
2007

Citing 6
Cited 4

Ten lectures on wavelets

Ten lectures on wavelets
An introduction to wavelets

An introduction to wavelets
Wavelets and subband coding

Wavelets and subband coding
Robustness in Automatic Speech Recognition: Fundamentals and Applications

Robustness in Automatic Speech Recognition: Fundamentals and Applications
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
De-noising by soft-thresholding

IEEE Transactions on Information Theory

Noise reduction method for chaotic signals based on dual-wavelet and spatial correlation

Expert Systems with Applications: An International Journal
Data transmission over GSM voice channel using digital modulation technique based on autoregressive modeling of speech production

Digital Signal Processing
Performance prediction methodology based on pattern recognition

Signal Processing
Improved Hilbert-Huang transform based weak signal detection methodology and its application on incipient fault diagnosis and ECG signal analysis

Signal Processing

Quantified Score

Hi-index	0.08

Visualization

Abstract

This paper presents a noise robust feature extraction algorithm NRFE using joint wavelet packet decomposition (WPD) and autoregressive (AR) modeling of a speech signal. In opposition to the short time Fourier transform (STFT)-based time-frequency signal representation, wavelet packet decomposition can lead to better representation of non-stationary parts of the speech signal (e.g. consonants). The vowels are well described with an AR model as in LPC analysis. The proposed Root-Log compression scheme is used to perform the computation of the wavelet packet parameters. The separately extracted WPD and AR-based parameters are combined together and then transformed with the usage of linear discriminant analysis (LDA) to finally produce a lower dimensional output feature vector. The noise robustness is improved with the application of proposed wavelet-based denoising algorithm with a modified soft thresholding procedure and time-frequency adaptive threshold. The proposed voice activity detector based on a skewness-to-kurtosis ratio of the LPC residual signal is used to effectively perform a frame-dropping principle. The speech recognition results achieved on Aurora 2 and Aurora 3 databases show overall performance improvement of 44.7% and 48.2% relative to the baseline MFCC front-end, respectively.