A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

Authors:
Mohammad H. Radfar;Richard M. Dansereau;Abolghasem Sayadiyan
Affiliations:
Department of Electrical Engineering, Amirkabir University, Tehran, Iran;Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada;Department of Electrical Engineering, Amirkabir University, Tehran, Iran
Venue:
EURASIP Journal on Audio, Speech, and Music Processing
Year:
2007

Citing 19
Cited 3

Blind separation of sources, Part 1: an adaptive algorithm based on neuromimetic architecture

Signal Processing
Vector quantization and signal compression

Vector quantization and signal compression
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
An information-maximization approach to blind separation and blind deconvolution

Neural Computation
A pitch determination and voiced/unvoiced decision algorithm for noisy speech

Speech Communication
Multiple period estimation and pitch perception model

Speech Communication
Harmonic sound stream segregation using localization and its application to speech stream segregation

Speech Communication
Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures

Speech Communication
A Variational Method for Learning Sparse and Overcomplete Representations

Neural Computation
Multi-pitch and periodicity analysis model for sound separation and auditory scene analysis

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Separation of harmonic sound sources using sinusoidal modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Vector quantization of harmonic magnitudes in speech coding applications: a survey and new technique

EURASIP Journal on Applied Signal Processing
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Single channel nonstationary stochastic signal separation using linear time-varying filters

IEEE Transactions on Signal Processing
Blind source separation-semiparametric statistical approach

IEEE Transactions on Signal Processing
Auditory Segmentation Based on Onset and Offset Analysis

IEEE Transactions on Audio, Speech, and Language Processing
Separation of synchronous pitched notes by spectral filtering of harmonics

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Speaker-independent model-based single channel speech separation

Neurocomputing
On the optimality of ideal binary time-frequency masks

Speech Communication
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new technique for separating two speech signals from a single recording. The proposed method bridges the gap between underdetermined blind source separation techniques and those techniques that model the human auditory system, that is, computational auditory scene analysis (CASA). For this purpose, we decompose the speech signal into the excitation signal and the vocal-tract-related filter and then estimate the components from the mixed speech using a hybrid model. We first express the probability density function (PDF) of the mixed speech's log spectral vectors in terms of the PDFs of the underlying speech signal's vocal-tract-related filters. Then, the mean vectors of PDFs of the vocal-tract-related filters are obtained using a maximum likelihood estimator given the mixed signal. Finally, the estimated vocal-tract-related filters along with the extracted fundamental frequencies are used to reconstruct estimates of the individual speech signals. The proposed technique effectively adds vocal-tract-related filter characteristics as a new cue to CASA models using a new grouping technique based on an underdetermined blind source separation. We compare our model with both an underdetermined blind source separation and a CASA method. The experimental results show that our model outperforms both techniques in terms of SNR improvement and the percentage of crosstalk suppression.