Optimal speech enhancement under signal presence uncertainty using Log Gabor Wavelet and Bayesian Joint Statistics

Authors:
Suman Senapati
Affiliations:
Indian Institute of Technology, Kharagpur, India
Venue:
International Journal of Speech Technology
Year:
2013

Citing 16
Cited 0

Description and generation of spherically invariant speech-model signals

Signal Processing
Fundamentals of speech recognition

Fundamentals of speech recognition
Speech Enhancement with Reduction of Noise Components in the Wavelet Domain

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Subjective comparison and evaluation of speech enhancement algorithms

Speech Communication
Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain

Speech Communication
Bayesian marginal statistics for speech enhancement using log Gabor wavelet

International Journal of Speech Technology
Bivariate shrinkage functions for wavelet-based denoising exploiting interscale dependency

IEEE Transactions on Signal Processing
Evaluation of Objective Quality Measures for Speech Enhancement

IEEE Transactions on Audio, Speech, and Language Processing
Simultaneous optimum detection and estimation of signals in noise

IEEE Transactions on Information Theory
A representation theorem and its applications to spherically-invariant random processes

IEEE Transactions on Information Theory
On spherically invariant random processes (Corresp.)

IEEE Transactions on Information Theory
De-noising by soft-thresholding

IEEE Transactions on Information Theory
Wavelet-based image estimation: an empirical Bayes approach using Jeffrey's noninformative prior

IEEE Transactions on Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of speech enhancement when only a single microphone is used and the statistics of the interfering noise and speech are not available a priori. Thus it seeks to address a pitfall of many current enhancement techniques and look towards a system which would have application in the real world. This paper focuses on Log Gabor Wavelet (LGW) based Long Term Squared Spectral Amplitude estimator using the Maximum a Posteriori (MAP) criterion. To begin with, long term cepstral mean subtraction technique with LGW is proposed to suppress telephone channel and handset effect from the speech signals. Then a novel speech enhancer by MAP based Bayesian Bivariate Model is developed to suppress the background noise. This work also introduces an inter-scale dependency between the coefficients and their parents by a Circularly Symmetric probability density function related to the family of Spherically Invariant Random Process (SIRPs). The corresponding joint estimator is derived by MAP estimation theory. The inter-scale noise variance of the coefficients is kept constant which gives closed form solution. Consideration of speech presence uncertainty (SPU) estimator is another contribution to the proposed estimator. Therefore, in this paper, the main contributions are; (i) combination of LGW, SIRPs and SPU for background noise reduction, (ii) LGW and Long Term Cepstral Mean Subtraction to reduce the effects of both telephone channel and handsets, (iii) circularly Symmetric probability density function to exploit the inter-scale dependency between the coefficients and their parents and corresponding joint estimators are derived by MAP estimation theory, (iv) the inter-scale noise variance of the coefficients is kept constant which gives closed form solution, (v) idea refines the estimate of the magnitudes by scaling them by the SPU probability. Extensive comparisons are done among the proposed and existing speech enhancement algorithms on NOIZEUS speech database which has different types of noise. We report the subjective and objective evaluations encompassing four classes of algorithms: spectral subtractive, subspace, statistical model based and Wiener type against the proposed methods. Experimental results show that the proposed estimator yields a higher improvement in Segmental SNR (SSNR), lower Log Area Ratio (LAR), Weighted Spectral Slope (WSS) distortion, higher Perceptual Evaluation of Speech Quality (PESQ) and Mean Opinion Score (MOS) compared to the existing speech enhancement algorithms. For SSNR measure, the proposed methods show 2 dB of improvement than existing methods for almost every Noise sources. For MOS measure, the proposed methods show improvement than existing methods for almost every Noise sources. Therefore the proposed methods are aiming to enhance the speech quality as well as intelligibility at a time.