Modelling auditory processing and organisation
Modelling auditory processing and organisation
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
On the optimality of ideal binary time-frequency masks
Speech Communication
A tandem algorithm for pitch estimation and voiced speech segregation
IEEE Transactions on Audio, Speech, and Language Processing
Blind separation of speech mixtures via time-frequency masking
IEEE Transactions on Signal Processing
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation
IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
For speech separation systems, the ideal binary mask (IBM) can be viewed as a simplified goal of the ideal ratio mask (IRM) which is derived from Wiener filter. The available research usually verify the rationality of this simplification from the aspect of speech intelligibility. However, the difference between the two masks has not been addressed rigorously in the signal-to-noise ratio (SNR) sense. In this paper, we analytically investigate the difference between the two ideal masks under the assumption of the approximate W-Disjoint Orthogonality (AWDO) which almost holds under many kinds of interference due to the sparse nature of speech. From the analysis, one theoretical upper bound of the difference is obtained under the AWDO assumption. Some other interesting discoveries include a new ratio mask which achieves higher SNR gains than the IRM and the essential relation between the AWDO degree and the SNR gain of the IRM.