Combining missing-feature theory, speech enhancement, and speaker-dependent/-independent modeling for speech separation

Authors:
Ji Ming;Timothy J. Hazen;James R. Glass
Affiliations:
School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, Belfast BT7 1NN, UK;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Venue:
Computer Speech and Language
Year:
2010

Citing 6
Cited 2

Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model

EURASIP Journal on Applied Signal Processing
A posterior unionmodel with applications to robust speech and speaker recognition

EURASIP Journal on Applied Signal Processing
Editorial: advances in blind source separation

EURASIP Journal on Applied Signal Processing
A Bayesian estimation approach for speech enhancement using hiddenMarkov models

IEEE Transactions on Signal Processing
MAP Estimators for Speech Enhancement Under Normal and Rayleigh Inverse Gaussian Distributions

IEEE Transactions on Audio, Speech, and Language Processing
Improved Subspace-Based Single-Channel Speech Enhancement Using Generalized Super-Gaussian Priors

IEEE Transactions on Audio, Speech, and Language Processing

Monaural speech separation and recognition challenge

Computer Speech and Language
The Markov selection model for concurrent speech recognition

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers the separation and recognition of overlapped speech sentences assuming single-channel observation. A system based on a combination of several different techniques is proposed. The system uses a missing-feature approach for improving crosstalk/noise robustness, a Wiener filter for speech enhancement, hidden Markov models for speech reconstruction, and speaker-dependent/-independent modeling for speaker and speech recognition. We develop the system on the Speech Separation Challenge database, involving a task of separating and recognizing two mixing sentences without assuming advanced knowledge about the identity of the speakers nor about the signal-to-noise ratio. The paper is an extended version of a previous conference paper submitted for the challenge.