Improving objective intelligibility prediction by combining correlation and coherence based methods with a measure based on the negative distortion ratio

Authors:
Angel M. GóMez;Belinda Schwerin;Kuldip Paliwal
Affiliations:
Dept. Teoría de la Señal, Telemática y Comunicaciones, University of Granada, Facultad de Ciencias, Campus de Fuentenueva S/N, Granada 18071, Spain and Signal Processing Laboratory, ...;Signal Processing Laboratory, School of Engineering, Griffith University, Australia;Signal Processing Laboratory, School of Engineering, Griffith University, Australia
Venue:
Speech Communication
Year:
2012

Citing 4
Cited 2

Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Prediction of speech intelligibility based on an auditory preprocessing model

Speech Communication
SNR loss: A new objective measure for predicting the intelligibility of noise-suppressed speech

Speech Communication
Evaluation of Objective Quality Measures for Speech Enhancement

IEEE Transactions on Audio, Speech, and Language Processing

A Hilbert-fine-structure-derived physical metric for predicting the intelligibility of noise-distorted and noise-suppressed speech

Speech Communication
Objective Intelligibility Measures Based on Mutual Information for Speech Subjected to Speech Enhancement Processing

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a novel objective method for intelligibility prediction of enhanced speech which is based on the negative distortion ratio (NDR) - that is, the amount of power spectra that has been removed in comparison to the original clean speech signal, likely due to a bad noise estimate during the speech enhancement procedure. While negative spectral distortions can have a significant importance in subjective intelligibility assessment of processed speech, most of the objective measures in the literature do not well account for this type of distortion. The proposed method focuses on a very specific type of noise, so it is not intended to be used alone but in combination with other techniques, to jointly achieve a better intelligibility prediction. In order to find an appropriate technique to be combined with, in this paper we also review a number of recently proposed methods based on correlation and coherence measures. These methods have already shown a high correlation with human recognition scores, as they effectively detect the presence of nonlinearities, frequently found in noise-suppressed speech. However, when these techniques are jointly applied with the proposed method, significantly higher correlations (above r=0.9) are shown to be achieved.