Monaural speech segregation based on pitch tracking and amplitude modulation

Authors:
Guoning Hu;DeLiang Wang
Affiliations:
Biophys. Program, Ohio State Univ., Columbus, OH, USA;-
Venue:
IEEE Transactions on Neural Networks
Year:
2004

Citing 0
Cited 31

On noise masking for automatic missing data speech recognition: A survey and discussion

Computer Speech and Language
An automatic speech recognition system based on the scene analysis account of auditory perception

Speech Communication
Monaural speech segregation based on fusion of source-driven with model-driven techniques

Speech Communication
A maximum likelihood estimation of vocal-tract-related filter characteristics for single channel speech separation

EURASIP Journal on Audio, Speech, and Music Processing
Using pitch, amplitude modulation, and spatial cues for separation of harmonic instruments from stereo music recordings

EURASIP Journal on Applied Signal Processing
Monophonic sound source separation with an unsupervised network of spiking neurones

Neurocomputing
Single channel audio source separation

WSEAS Transactions on Signal Processing
On the optimality of ideal binary time-frequency masks

Speech Communication
Sequential organization of speech in computational auditory scene analysis

Speech Communication
Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition

Computer Speech and Language
Robust speech recognition by integrating speech separation and hypothesis testing

Speech Communication
Musical sound separation based on binary time-frequency masking

EURASIP Journal on Audio, Speech, and Music Processing
Separation of dominant speech from simultaneous talkers

SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

IEEE Transactions on Audio, Speech, and Language Processing
Prediction of speech intelligibility based on an auditory preprocessing model

Speech Communication
Binaural speech separation using recurrent timing neural networks for joint F0-localisation estimation

MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Cocktail party processing

WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
A computational auditory scene analysis-enhanced beamforming approach for sound source separation

EURASIP Journal on Advances in Signal Processing - Special issue on digital signal processing for hearing instruments
Dynamic precedence effect modeling for source separation in reverberant environments

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Solving demodulation as an optimization problem

IEEE Transactions on Audio, Speech, and Language Processing
A tandem algorithm for pitch estimation and voiced speech segregation

IEEE Transactions on Audio, Speech, and Language Processing
Improving speech intelligibility in noise using environment-optimized algorithms

IEEE Transactions on Audio, Speech, and Language Processing
A multistage approach to blind separation of convolutive speech mixtures

Speech Communication
Single-channel speech separation based on long-short frame associated harmonic model

Digital Signal Processing
Monaural voiced speech segregation based on dynamic harmonic function

EURASIP Journal on Audio, Speech, and Music Processing
BLUES from music: BLind underdetermined extraction of sources from music

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Separating underdetermined convolutive speech mixtures

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Perceptive, non-linear speech processing and spiking neural networks

Nonlinear Speech Modeling and Applications
A quantitative evaluation of a bio-inspired sound segregation technique for two- and three-source mixtures

Nonlinear Speech Modeling and Applications
The analysis of the simplification from the ideal ratio to binary mask in signal-to-noise ratio sense

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM rates. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to dominant pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated and compared with pervious systems, and it yields substantially better performance, especially for the high-frequency part of speech.