Learning Patterns of Activity Using Real-Time Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
On noise masking for automatic missing data speech recognition: A survey and discussion
Computer Speech and Language
Issues with uncertainty decoding for noise robust automatic speech recognition
Speech Communication
A speech fragment approach to localising multiple speakers in reverberant environments
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition
Computer Speech and Language
Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment
IEEE Transactions on Audio, Speech, and Language Processing
Combining Speech Fragment Decoding and Adaptive Noise Floor Modeling
IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge
Computer Speech and Language
Hi-index | 0.00 |
This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman's account of auditory scene analysis, in which innate primitive grouping 'rules' are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into local time-frequency fragments of individual sound sources using signal-level primitive grouping cues. Second, statistical models are employed to select fragments belonging to the sound source of interest, and the hypothesis-driven stage simultaneously searches for the most probable speech/background segmentation and the corresponding acoustic model state sequence. The paper reports recent advances in combining adaptive noise floor modelling and binaural localisation cues within this framework. By integrating signal-level grouping cues with acoustic models of the target sound source in a probabilistic framework, the system is able to simultaneously separate and recognise the sound of interest from the mixture, and derive significant recognition performance benefits from different grouping cue estimates despite their inherent unreliability in noisy conditions. Finally, the paper will show that missing data imputation can be applied via fragment decoding to allow reconstruction of a clean spectrogram that can be further processed and used as input to conventional ASR systems. The best performing system achieves an average keyword recognition accuracy of 85.83% on the PASCAL CHiME Challenge task.