Single-channel speech enhancement using spectral subtraction in the short-time modulation domain

Authors:
Kuldip Paliwal;Kamil Wójcicki;Belinda Schwerin
Affiliations:
Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Nathan QLD 4111, Australia;Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Nathan QLD 4111, Australia;Signal Processing Laboratory, Griffith School of Engineering, Griffith University, Nathan QLD 4111, Australia
Venue:
Speech Communication
Year:
2010

Citing 7
Cited 8

Filtering the time sequences of spectral parameters for speech recognition

Speech Communication
Robust speech recognition using the modulation spectrogram

Speech Communication - Special issue on robust speech recognition
Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing

Pattern Recognition Letters
Subjective comparison and evaluation of speech enhancement algorithms

Speech Communication
Joint acoustic and modulation frequency

EURASIP Journal on Applied Signal Processing
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Temporal contrast normalization and edge-preserved smoothing of temporal modulation structures of speech for robust speech recognition

Speech Communication

Role of modulation magnitude and phase spectrum towards speech intelligibility

Speech Communication
Modulation-domain Kalman filtering for single-channel speech enhancement

Speech Communication
Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator

Speech Communication
Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement

Speech Communication
Real and imaginary modulation spectral subtraction for speech enhancement

Speech Communication
Compressive speech enhancement

Speech Communication
Modulation domain blind speech separation in noisy environments

Speech Communication
Using STFT real and imaginary parts of modulation signals for MMSE-based speech enhancement

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate the modulation domain as an alternative to the acoustic domain for speech enhancement. More specifically, we wish to determine how competitive the modulation domain is for spectral subtraction as compared to the acoustic domain. For this purpose, we extend the traditional analysis-modification-synthesis framework to include modulation domain processing. We then compensate the noisy modulation spectrum for additive noise distortion by applying the spectral subtraction algorithm in the modulation domain. Using an objective speech quality measure as well as formal subjective listening tests, we show that the proposed method results in improved speech quality. Furthermore, the proposed method achieves better noise suppression than the MMSE method. In this study, the effect of modulation frame duration on speech quality of the proposed enhancement method is also investigated. The results indicate that modulation frame durations of 180-280ms, provide a good compromise between different types of spectral distortions, namely musical noise and temporal slurring. Thus given a proper selection of modulation frame duration, the proposed modulation spectral subtraction does not suffer from musical noise artifacts typically associated with acoustic spectral subtraction. In order to achieve further improvements in speech quality, we also propose and investigate fusion of modulation spectral subtraction with the MMSE method. The fusion is performed in the short-time spectral domain by combining the magnitude spectra of the above speech enhancement algorithms. Subjective and objective evaluation of the speech enhancement fusion shows consistent speech quality improvements across input SNRs.