The PASCAL CHiME speech separation and recognition challenge

Authors:
Jon Barker;Emmanuel Vincent;Ning Ma;Heidi Christensen;Phil Green
Affiliations:
Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK;INRIA, Centre de Rennes - Bretagne Atlantique, 35042 Rennes Cedex, France;Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK;Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
Venue:
Computer Speech and Language
Year:
2013

Citing 4
Cited 9

Monaural speech separation and recognition challenge

Computer Speech and Language
Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
To separate speech: a system for recognizing simultaneous speech

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Microphone array beamforming approach to blind speech separation

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction

Blind source extraction for robust speech recognition in multisource noisy environments

Computer Speech and Language
Modelling non-stationary noise with spectral factorisation in automatic speech recognition

Computer Speech and Language
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language
Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment

Computer Speech and Language
A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

Computer Speech and Language
Integration of beamforming and uncertainty-of-observation techniques for robust ASR in multi-source environments

Computer Speech and Language
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Computer Speech and Language
Uncertainty-based learning of acoustic models from noisy data

Computer Speech and Language
Noise-robust speech recognition through auditory feature detection and spike sequence decoding

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distant microphone speech recognition systems that operate with human-like robustness remain a distant goal. The key difficulty is that operating in everyday listening conditions entails processing a speech signal that is reverberantly mixed into a noise background composed of multiple competing sound sources. This paper describes a recent speech recognition evaluation that was designed to bring together researchers from multiple communities in order to foster novel approaches to this problem. The task was to identify keywords from sentences reverberantly mixed into audio backgrounds binaurally recorded in a busy domestic environment. The challenge was designed to model the essential difficulties of the multisource environment problem while remaining on a scale that would make it accessible to a wide audience. Compared to previous ASR evaluations a particular novelty of the task is that the utterances to be recognised were provided in a continuous audio background rather than as pre-segmented utterances thus allowing a range of background modelling techniques to be employed. The challenge attracted thirteen submissions. This paper describes the challenge problem, provides an overview of the systems that were entered and provides a comparison alongside both a baseline recognition system and human performance. The paper discusses insights gained from the challenge and lessons learnt for the design of future such evaluations.