Monaural speech separation and recognition challenge

Authors:
Martin Cooke;John R. Hershey;Steven J. Rennie
Affiliations:
Ikerbasque (Basque Science Foundation) Alameda Urquijo, 36-5, Plaza Bizkaia, 48011 Bilbao, Bizkaia, Spain and Departamento de Electricidad y Electrónica, Facultad de Ciencias y Tecnologí ...;IBM, T.J. Watson Research Center, Yorktown Heights, NY, USA;IBM, T.J. Watson Research Center, Yorktown Heights, NY, USA
Venue:
Computer Speech and Language
Year:
2010

Citing 14
Cited 16

Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Speech recognition in noisy environments: a survey

Speech Communication
An information-maximization approach to blind separation and blind deconvolution

Neural Computation
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Prediction-driven computational auditory scene analysis

Prediction-driven computational auditory scene analysis
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
Speech separation using speaker-adapted eigenvoice speech models

Computer Speech and Language
Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Computer Speech and Language
Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation

Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition

Computer Speech and Language
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Computer Speech and Language
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation

Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Computer Speech and Language
The 2010 signal separation evaluation campaign (SiSEC2010): audio source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Combining localization cues and source model constraints for binaural source separation

Speech Communication
Trends and advances in speech recognition

IBM Journal of Research and Development
The Markov selection model for concurrent speech recognition

Neurocomputing
The signal separation evaluation campaign (2007-2010): Achievements and remaining challenges

Signal Processing
Disordered voice measurement and auditory analysis

Speech Communication
A non-negative approach to language informed speech separation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
The 2011 signal separation evaluation campaign (SiSEC2011): - audio source separation -

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Optimization and Parallelization of Monaural Source Separation Algorithms in the openBliSSART Toolkit

Journal of Signal Processing Systems
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language
Blind source extraction for robust speech recognition in multisource noisy environments

Computer Speech and Language
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Computer Speech and Language
Multi-pitch Streaming of Harmonic Sound Mixtures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and competing speech to speech separation using auditory grouping principles. The purpose of the monaural speech separation and recognition challenge was to permit a large-scale comparison of techniques for the competing talker problem. The task was to identify keywords in sentences spoken by a target talker when mixed into a single channel with a background talker speaking similar sentences. Ten independent sets of results were contributed, alongside a baseline recognition system. Performance was evaluated using common training and test data and common metrics. Listeners' performance in the same task was also measured. This paper describes the challenge problem, compares the performance of the contributed algorithms, and discusses the factors which distinguish the systems. One highlight of the comparison was the finding that several systems achieved near-human performance in some conditions, and one out-performed listeners overall.