Speaker-independent model-based single channel speech separation

Authors:
M. H. Radfar;R. M. Dansereau;A. Sayadiyan
Affiliations:
Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada;Department of Systems and Computer Engineering, Carleton University, Ottawa, Ontario, Canada;Department of Electrical Engineering, Amirkabir University of Technology, Tehran, Iran
Venue:
Neurocomputing
Year:
2008

Citing 5
Cited 1

Vector quantization and signal compression

Vector quantization and signal compression
Learning nonlinear overcomplete representations for efficient coding

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Digital Signal Processing

Digital Signal Processing
A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

EURASIP Journal on Audio, Speech, and Music Processing
Underdetermined blind source separation based on sparse representation

IEEE Transactions on Signal Processing

Single-channel speech separation based on long-short frame associated harmonic model

Digital Signal Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we present a model-based single channel speech separation (SCSS) technique with two attributes. First, the proposed techniques is speaker-independent. Second, the proposed technique is able to separate out speech signals even though they have been mixed with different levels of energy. A mathematical model is derived in which the probability density function (PDF) of the mixed signal is expressed in terms of envelopes and excitation signals of sources and associated gains. Then a maximum likelihood estimator is used to estimate the sources' parameters and gains. The proposed technique is evaluated with male+male, male+female, and female+female mixtures. The experimental results show a significant signal-to-noise ratio (SNR) improvement when the proposed technique is compared with approaches which apply the excitation signals or log spectra to separate the speech signals in the speaker-independent speech separation scenario.