Separation of speech from interfering sounds based on oscillatory correlation

Authors:
DeLiang L. Wang;G. J. Brown
Affiliations:
Dept. of Comput. & Inf. Sci., Ohio State Univ., Columbus, OH;-
Venue:
IEEE Transactions on Neural Networks
Year:
1999

Citing 0
Cited 28

A Neural Oscillator Model of Auditory Attention

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
A maximum likelihood approach to single-channel source separation

The Journal of Machine Learning Research
A maximum likelihood approach to single-channel source separation

The Journal of Machine Learning Research
The Cocktail Party Problem

Neural Computation
A Biologically Motivated Solution to the Cocktail Party Problem

Neural Computation
On noise masking for automatic missing data speech recognition: A survey and discussion

Computer Speech and Language
Monaural speech segregation based on fusion of source-driven with model-driven techniques

Speech Communication
Source separation with one ear: proposition for an anthropomorphic approach

EURASIP Journal on Applied Signal Processing
A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

EURASIP Journal on Audio, Speech, and Music Processing
Exploiting correlogram structure for robust speech recognition with multiple speech sources

Speech Communication
Monophonic sound source separation with an unsupervised network of spiking neurones

Neurocomputing
On the optimality of ideal binary time-frequency masks

Speech Communication
Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition

Computer Speech and Language
Robust speech recognition by integrating speech separation and hypothesis testing

Speech Communication
Cocktail party processing

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
A computational auditory scene analysis-enhanced beamforming approach for sound source separation

EURASIP Journal on Advances in Signal Processing - Special issue on digital signal processing for hearing instruments
Speech activity detection for multi-party conversation analyses based on likelihood ratio test on spatial magnitude

IEEE Transactions on Audio, Speech, and Language Processing
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing
Single-channel speech separation based on long-short frame associated harmonic model

Digital Signal Processing
Monaural voiced speech segregation based on dynamic harmonic function

EURASIP Journal on Audio, Speech, and Music Processing
Object selection in visual scene via oscillatory network with controllable coupling and self-organized performance

Optical Memory and Neural Networks
Principles and typical computational limitations of sparse speaker separation based on deterministic speech features

Neural Computation
BLUES from music: BLind underdetermined extraction of sources from music

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Separating underdetermined convolutive speech mixtures

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Perceptive, non-linear speech processing and spiking neural networks

Nonlinear Speech Modeling and Applications
A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures

Nonlinear Speech Modeling and Applications
The analysis of the simplification from the ideal ratio to binary mask in signal-to-noise ratio sense

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multistage neural model is proposed for an auditory scene analysis task-segregating speech from interfering sound sources. The core of the model is a two-layer oscillator network that performs stream segregation on the basis of oscillatory correlation. In the oscillatory correlation framework, a stream is represented by a population of synchronized relaxation oscillators, each of which corresponds to an auditory feature, and different streams are represented by desynchronized oscillator populations. Lateral connections between oscillators encode harmonicity, and proximity in frequency and time. Prior to the oscillator network are a model of the auditory periphery and a stage in which mid-level auditory representations are formed. The model has been systematically evaluated using a corpus of voiced speech mixed with interfering sounds, and produces improvements in terms of signal-to-noise ratio for every mixture. A number of issues including biological plausibility and real-time implementation are also discussed