Psychoacoustically constrained and distortion minimized speech enhancement

Authors:
Seokhwan Jo;Chang D. Yoo
Affiliations:
Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea;Department of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, Korea
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 1

The dual excitation speech model

The dual excitation speech model
Convex Optimization

Convex Optimization
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
A data-driven approach to optimizing spectral speech enhancement methods for various error criteria

Speech Communication
An MMSE Estimator for Speech Enhancement Under a Combined Stochastic–Deterministic Speech Model

IEEE Transactions on Audio, Speech, and Language Processing
Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations

IEEE Transactions on Audio, Speech, and Language Processing
Speech Enhancement Based on Generalized Minimum Mean Square Error Estimators and Masking Properties of the Auditory System

IEEE Transactions on Audio, Speech, and Language Processing

Reduction of musical residual noise using block-and-directional-median filter adapted by harmonic properties

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers a psychoacoustically constrained and distortion minimized speech enhancement algorithm. Noise reduction, in general, leads to speech distortion, and a balanced tradeoff between noise reduction and speech distortion must be attained. A constrained optimization problem is set to reduce noise so that speech distortion is minimized while the sum of speech distortion and residual noise is kept below the masking threshold of the clean speech. Obtaining a solution to the optimization problem may be infeasible under certain conditions, and a slack variable is introduced to allow certain deviation from the constraint conditions. To estimate the power spectral density and also the masking threshold of clean speech, a speech model that assumes coexisting deterministic and stochastic components in speech is used. Experimental results show that the considered algorithm outperforms some of the more popular algorithms in terms of improvement in segmental signal-to-noise ratio (SegSNR), spectral distance (SD), modified Bark spectral distortion (MBSD), and mean opinion score (MOS).