Evaluating source separation algorithms with reverberant speech

Authors:
Michael I. Mandel;Scott Bressler;Barbara Shinn-Cunningham;Daniel P. W. Ellis
Affiliations:
Département d'informatique et de Recherche Opérationnelle, Université de Montréal, Montreal, QC, Canada;Department of Cognitive and Neural Systems, Boston University, Boston, MA;Department of Cognitive and Neural Systems, Boston University, Boston, MA;Department of Electrical Engineering, Columbia University, New York, NY
Venue:
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Year:
2010

Citing 11
Cited 1

Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 05
Fast communication: Perceptual evaluation of blind source separation for robust speech recognition

Signal Processing
On the optimality of ideal binary time-frequency masks

Speech Communication
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Model-based expectation-maximization source separation and localization

IEEE Transactions on Audio, Speech, and Language Processing
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Evaluation of Objective Quality Measures for Speech Enhancement

IEEE Transactions on Audio, Speech, and Language Processing

The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges

Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the performance of several source separation systems on a speech separation task for which human intelligibility has previously been measured. For anechoic mixtures, automatic speech recognition (ASR) performance on the separated signals is quite similar to human performance. In reverberation, however, while signal separation has some benefit for ASR, the results are still far below those of human listeners facing the same task. Performing this same experiment with a number of oracle masks created with a priori knowledge of the separated sources motivates a new objective measure of separation performance, the Direct-path, Early echo, and Reverberation, of the Target and Masker (DERTM), which is closely related to the ASR results. This measure indicates that while the nonoracle algorithms successfully reject the direct-path signal from the masking source, they reject less of its reverberation, explaining the disappointing ASR performance.