Single-channel speech separation and recognition using loopy belief propagation

Authors:
Steven J. Rennie;John R. Hershey;Peder A. Olsen
Affiliations:
IBM T.J.Watson Research Center, USA;IBM T.J.Watson Research Center, USA;IBM T.J.Watson Research Center, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 2

Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Trends and advances in speech recognition

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of single-channel speech separation and recognition using loopy belief propagation in a way that enables efficient inference for an arbitrary number of speech sources. The graphical model consists of a set of N Markov chains, each of which represents a language model or grammar for a given speaker. A Gaussian mixture model with shared states is used to model the hidden acoustic signal for each grammar state of each source. The combination of sources is modeled in the log spectrum domain using non-linear interaction functions. Previously, temporal inference in such a model has been performed using an N-dimensional Viterbi algorithm that scales exponentially with the number of sources. In this paper, we describe a loopy message passing algorithm that scales linearly with language model size. The algorithm achieves human levels of performance, and is an order of magnitude faster than competitive systems for two speakers.