On the optimality of ideal binary time-frequency masks

Authors:
Yipeng Li;DeLiang Wang
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210-1277, USA;Department of Computer Science and Engineering and Center of Cognitive Science, The Ohio State University, Columbus, OH 43210-1277, USA
Venue:
Speech Communication
Year:
2009

Citing 13
Cited 6

Modelling auditory processing and organisation

Modelling auditory processing and organisation
Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Extrapolation, Interpolation, and Smoothing of Stationary Time Series

Extrapolation, Interpolation, and Smoothing of Stationary Time Series
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Oracle estimators for the benchmarking of source separation algorithms

Signal Processing
A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

EURASIP Journal on Audio, Speech, and Music Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Mask estimation for missing data speech recognition based on statistics of binaural interaction

IEEE Transactions on Audio, Speech, and Language Processing
Soft Mask Methods for Single-Channel Speaker Separation

IEEE Transactions on Audio, Speech, and Language Processing
Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Musical sound separation based on binary time-frequency masking

EURASIP Journal on Audio, Speech, and Music Processing
Evaluating source separation algorithms with reverberant speech

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Auditory model based direction estimation of concurrent speakers from binaural signals

Speech Communication
Monaural voiced speech segregation based on dynamic harmonic function

EURASIP Journal on Audio, Speech, and Music Processing
Impact of SNR and gain-function over- and under-estimation on speech intelligibility

Speech Communication
The analysis of the simplification from the ideal ratio to binary mask in signal-to-noise ratio sense

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The concept of ideal binary time-frequency masks has received attention recently in monaural and binaural sound separation. Although often assumed, the optimality of ideal binary masks in terms of signal-to-noise ratio has not been rigorously addressed. In this paper we give a formal treatment on this issue and clarify the conditions for ideal binary masks to be optimal. We also experimentally compare the performance of ideal binary masks to that of ideal ratio masks on a speech mixture database and a music database. The results show that ideal binary masks are close in performance to ideal ratio masks which are closely related to the Wiener filter, the theoretically optimal linear filter.