The analysis of the simplification from the ideal ratio to binary mask in signal-to-noise ratio sense

Authors:
Shan Liang;Wenju Liu;Wei Jiang;Wei Xue
Affiliations:
-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 10
Cited 0

Modelling auditory processing and organisation

Modelling auditory processing and organisation
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
On the optimality of ideal binary time-frequency masks

Speech Communication
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Underdetermined Convolutive Blind Source Separation via Frequency Bin-Wise Clustering and Permutation Alignment

IEEE Transactions on Audio, Speech, and Language Processing
Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

For speech separation systems, the ideal binary mask (IBM) can be viewed as a simplified goal of the ideal ratio mask (IRM) which is derived from Wiener filter. The available research usually verify the rationality of this simplification from the aspect of speech intelligibility. However, the difference between the two masks has not been addressed rigorously in the signal-to-noise ratio (SNR) sense. In this paper, we analytically investigate the difference between the two ideal masks under the assumption of the approximate W-Disjoint Orthogonality (AWDO) which almost holds under many kinds of interference due to the sparse nature of speech. From the analysis, one theoretical upper bound of the difference is obtained under the AWDO assumption. Some other interesting discoveries include a new ratio mask which achieves higher SNR gains than the IRM and the essential relation between the AWDO degree and the SNR gain of the IRM.