The Markov selection model for concurrent speech recognition

Authors:
Paris Smaragdis;Bhiksha Raj
Affiliations:
University of Illinois, Urbana-Champaign, IL, USA and Adobe Systems Inc., USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Neurocomputing
Year:
2012

Citing 12
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions

Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
Advanced training methods and new network topologies for hybrid MMI-Connectionist/HMM speech recognition systems

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Speech recognition in noisy environments

Speech recognition in noisy environments
Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation

ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Monaural speech separation and recognition challenge

Computer Speech and Language
Speech separation using speaker-adapted eigenvoice speech models

Computer Speech and Language
Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Computer Speech and Language
Super-human multi-talker speech recognition: A graphical modeling approach

Computer Speech and Language
Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation

Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition

Computer Speech and Language
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we introduce a new Markov model that is capable of recognizing speech from recordings of simultaneously speaking a priori known speakers. This work is based on recent work on non-negative representations of spectrograms, which has been shown to be very effective in source separation problems. In this paper we extend these approaches to design a Markov selection model that is able to recognize sequences even when they are presented mixed together. We do so without the need to perform separation on the signals. Unlike factorial Markov models which have been used similarly in the past that feature state spaces that are exponential in the number of sources, this approach features a low computational complexity model with a state space that is linear in the number of sources. We demonstrate the use of this framework in recognizing speech from mixtures of known speakers.