Monaural speech segregation based on fusion of source-driven with model-driven techniques

Authors:
Mohammad H. Radfar;Richard M. Dansereau;Abolghasem Sayadiyan
Affiliations:
The Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6 and The Department of Electrical Engineering, Amirkabir University o ...;The Department of Systems and Computer Engineering, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6;The Department of Electrical Engineering, Amirkabir University of Technology, 424 Hafez Avenue, Tehran 15875-4413, Iran
Venue:
Speech Communication
Year:
2007

Citing 7
Cited 1

Vector quantization and signal compression

Vector quantization and signal compression
An information-maximization approach to blind separation and blind deconvolution

Neural Computation
A Variational Method for Learning Sparse and Overcomplete Representations

Neural Computation
Separation of harmonic sound sources using sinusoidal modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Vector quantization of harmonic magnitudes in speech coding applications: a survey and new technique

EURASIP Journal on Applied Signal Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Single-channel speech separation based on long-short frame associated harmonic model

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper by exploiting the prevalent methods in speech coding and synthesis, a new single channel speech segregation technique is presented. The technique integrates a model-driven method with a source-driven method to take advantage of both individual approaches and reduce their pitfalls significantly. We apply harmonic modelling in which the pitch and spectrum envelope are the main components for the analysis and synthesis stages. Pitch values of two speakers are obtained by using a source-driven method. The spectrum envelope, is obtained by using a new model-driven technique consisting of four components: a trained codebook of the vector quantized envelopes (VQ-based separation), a mixture-maximum approximation (MIXMAX), minimum mean square error estimator (MMSE), and a harmonic synthesizer. In contrast with previous model-driven techniques, this approach is speaker independent and can separate out the unvoiced regions as well as suppress the crosstalk effect which both are the drawbacks of source-driven or equivalently computational auditory scene analysis (CASA) models. We compare our fused model with both model- and source-driven techniques by conducting subjective and objective experiments. The results show that although for the speaker-dependent case, model-based separation delivers the best quality, for a speaker independent scenario the integrated model outperforms the individual approaches. This result supports the idea that the human auditory system takes on both grouping cues (e.g., pitch tracking) and a priori knowledge (e.g., trained quantized envelopes) to segregate speech signals.