Source separation with one ear: proposition for an anthropomorphic approach

Authors:
Jean Rouat;Ramin Pichevar
Affiliations:
Département de Génie Électrique et de Génie Informatique, Université Sherbrooke, and (ETMETIS), Département de Sciences Appliqués, Université du Québec ...;Département de Génie Électrique et de Génie Informatique, Université Sherbrooke, and (ETMETIS), Département de Sciences Appliqués, Université du Québec ...
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2005

Citing 7
Cited 2

A neural cocktail-party processor

Biological Cybernetics
A pitch determination and voiced/unvoiced decision algorithm for noisy speech

Speech Communication
Networks of spiking neurons: the third generation of neural network models

Transactions of the Society for Computer Simulation International - Special issue: simulation methodology in transportation systems
Spatio-Temporal Pattern Recognition with Neural Networks: Application to Speech

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
A maximum likelihood approach to single-channel source separation

The Journal of Machine Learning Research
Joint acoustic and modulation frequency

EURASIP Journal on Applied Signal Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks

Monophonic sound source separation with an unsupervised network of spiking neurones

Neurocomputing
Towards neurocomputational speech and sound processing

Progress in nonlinear speech processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitude modulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources from the original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised.