Single-channel speech separation and recognition using loopy belief propagation
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Monaural speech separation and recognition challenge
Computer Speech and Language
A Bayesian estimation approach for speech enhancement using hiddenMarkov models
IEEE Transactions on Signal Processing
On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs
IEEE Transactions on Information Theory
Monaural speech separation and recognition challenge
Computer Speech and Language
Evaluating source separation algorithms with reverberant speech
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Trends and advances in speech recognition
IBM Journal of Research and Development
The Markov selection model for concurrent speech recognition
Neurocomputing
A non-negative approach to language informed speech separation
LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
The PASCAL CHiME speech separation and recognition challenge
Computer Speech and Language
Modelling non-stationary noise with spectral factorisation in automatic speech recognition
Computer Speech and Language
Hi-index | 0.00 |
We present a system that can separate and recognize the simultaneous speech of two people recorded in a single channel. Applied to the monaural speech separation and recognition challenge, the system out-performed all other participants -including human listeners - with an overall recognition error rate of 21.6%, compared to the human error rate of 22.3%. The system consists of a speaker recognizer, a model-based speech separation module, and a speech recognizer. For the separation models we explored a range of speech models that incorporate different levels of constraints on temporal dynamics to help infer the source speech signals. The system achieves its best performance when the model of temporal dynamics closely captures the grammatical constraints of the task. For inference, we compare a 2-D Viterbi algorithm and two loopy belief-propagation algorithms. We show how belief-propagation reduces the complexity of temporal inference from exponential to linear in the number of sources and the size of the language model. The best belief-propagation method results in nearly the same recognition error rate as exact inference.