Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 4 - Volume 4
Speech recognition in noisy environments
Speech recognition in noisy environments
Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Monaural speech separation and recognition challenge
Computer Speech and Language
Speech separation using speaker-adapted eigenvoice speech models
Computer Speech and Language
Monaural speech separation based on MAXVQ and CASA for robust speech recognition
Computer Speech and Language
Super-human multi-talker speech recognition: A graphical modeling approach
Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition
Computer Speech and Language
Speech fragment decoding techniques for simultaneous speaker identification and speech recognition
Computer Speech and Language
Hi-index | 0.01 |
In this paper we introduce a new Markov model that is capable of recognizing speech from recordings of simultaneously speaking a priori known speakers. This work is based on recent work on non-negative representations of spectrograms, which has been shown to be very effective in source separation problems. In this paper we extend these approaches to design a Markov selection model that is able to recognize sequences even when they are presented mixed together. We do so without the need to perform separation on the signals. Unlike factorial Markov models which have been used similarly in the past that feature state spaces that are exponential in the number of sources, this approach features a low computational complexity model with a state space that is linear in the number of sources. We demonstrate the use of this framework in recognizing speech from mixtures of known speakers.