Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 05
On the optimality of ideal binary time-frequency masks
Speech Communication
A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Super-human multi-talker speech recognition: A graphical modeling approach
Computer Speech and Language
Model-based expectation-maximization source separation and localization
IEEE Transactions on Audio, Speech, and Language Processing
Blind separation of speech mixtures via time-frequency masking
IEEE Transactions on Signal Processing
Performance measurement in blind audio source separation
IEEE Transactions on Audio, Speech, and Language Processing
Evaluation of Objective Quality Measures for Speech Enhancement
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
This paper examines the performance of several source separation systems on a speech separation task for which human intelligibility has previously been measured. For anechoic mixtures, automatic speech recognition (ASR) performance on the separated signals is quite similar to human performance. In reverberation, however, while signal separation has some benefit for ASR, the results are still far below those of human listeners facing the same task. Performing this same experiment with a number of oracle masks created with a priori knowledge of the separated sources motivates a new objective measure of separation performance, the Direct-path, Early echo, and Reverberation, of the Target and Masker (DERTM), which is closely related to the ASR results. This measure indicates that while the nonoracle algorithms successfully reject the direct-path signal from the masking source, they reject less of its reverberation, explaining the disappointing ASR performance.