A new representation for speech frame recognition based on redundant wavelet filter banks

Authors:
Hamid Reza Tohidypour;Seyyed Ali Seyyedsalehi;Hossein Behbood;Hossein Roshandel
Affiliations:
Department of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran;Department of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran;Department of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran;Department of Electrical Engineering, Amirkabir University of Technology (Tehran-Polytechnic), Tehran, Iran
Venue:
Speech Communication
Year:
2012

Citing 5
Cited 1

Speaker identification using discrete wavelet packet transform technique with irregular decomposition

Expert Systems with Applications: An International Journal
Nonlinear normalization of input patterns to speaker variability in speech recognition neural networks

Neural Computing and Applications
A Higher Density Discrete Wavelet Transform

IEEE Transactions on Signal Processing
Symmetric nearly shift-invariant tight frame wavelets

IEEE Transactions on Signal Processing
Bayesian Separation With Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Autoregressive modeling of speech trajectory transformed to the reconstructed phase space for ASR purposes

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the conventional wavelet transform possesses multi-resolution properties, it is not optimized for speech recognition systems. It suffers from lower performance compared with Mel Frequency Cepstral Coefficients (MFCCs) in which Mel scale is based on human auditory perception. In this paper, some new speech representations based on redundant wavelet filter-banks (RWFB) are proposed. RWFB parameters are much less shift-sensitive than those of critically sampled discrete wavelet transform (DWT), so they seem to feature better performance in speech recognition tasks because of having better time-frequency localization ability. However, the improvement is at the expense of higher redundancy. In this paper, some types of wavelet representations are introduced, including a combination of critically sampled DWT and some different multi-channel redundant filter-banks down-sampled by 2. In order to find appropriate filter values for multi-channel filter-banks, effects of changing the zero moments of proposed wavelet are discussed. The corresponding method performances are compared in a phoneme recognition task using time delay neural networks. It is revealed that redundant multi-channel wavelet filter-banks work better than conventional DWT in speech recognition systems. The proposed four-channel higher density discrete wavelet filter-bank results in up to approximately 8.95% recognition rate increase, compared with critically sampled two-channel wavelet filter-bank.