Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain

  • Authors:
  • Suman Senapati;Sandipan Chakroborty;Goutam Saha

  • Affiliations:
  • Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721 302, India;Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721 302, India;Department of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, Kharagpur 721 302, India

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

In speech enhancement, Bayesian Marginal models cannot explain the inter-scale statistical dependencies of different wavelet scales. Simple non-linear estimators for wavelet-based denoising assume that the wavelet coefficients in different scales are independent in nature. However, wavelet coefficients have significant inter-scale dependencies. This paper introduces a new method that uses the inter-scale dependency between the coefficients and their parents by a Circularly Symmetric Probability Density Function (CS-PDF) related to the family of Spherically Invariant Random Processes (SIRPs) in Log Gabor Wavelet (LGW) domain and corresponding joint shrinkage estimators are derived by Maximum a Posteriori (MAP) estimation theory. The proposed work presents two different joint shrinkage estimators. In first, the inter-scale variance of LGW coefficients is kept constant which gives a closed form solution. In second, a relatively more complex approach is presented where variance is not constrained to be constant. It is also shown that the proposed methods show better performance when speech uncertainty is taken into consideration. The robustness of the proposed frameworks are tested on 50 speakers of POLYCOST and YOHO speech corpus in four different noisy environments against four established speech enhancement algorithms. Experimental results show that the proposed estimators yield a higher improvement in Segmental SNR (S-SNR) and also lower Log Spectral Distortion (LSD) compared to other estimators. In the second evaluation, the proposed speech enhancement techniques are found to give more robust Digit Recognition in noisy conditions on the AURORA 2.0 speech corpus compared to competing methods.