On noise masking for automatic missing data speech recognition: A survey and discussion
Computer Speech and Language
EURASIP Journal on Audio, Speech, and Music Processing
EURASIP Journal on Applied Signal Processing
Single channel audio source separation
WSEAS Transactions on Signal Processing
On the optimality of ideal binary time-frequency masks
Speech Communication
Sequential organization of speech in computational auditory scene analysis
Speech Communication
Monaural speech separation based on MAXVQ and CASA for robust speech recognition
Computer Speech and Language
A computational auditory scene analysis system for speech segregation and robust speech recognition
Computer Speech and Language
Robust speech recognition by integrating speech separation and hypothesis testing
Speech Communication
Musical sound separation based on binary time-frequency masking
EURASIP Journal on Audio, Speech, and Music Processing
Separation of dominant speech from simultaneous talkers
SIP '07 Proceedings of the Ninth IASTED International Conference on Signal and Image Processing
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset
IEEE Transactions on Audio, Speech, and Language Processing
Prediction of speech intelligibility based on an auditory preprocessing model
Speech Communication
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
WCCI'08 Proceedings of the 2008 IEEE world conference on Computational intelligence: research frontiers
A computational auditory scene analysis-enhanced beamforming approach for sound source separation
EURASIP Journal on Advances in Signal Processing - Special issue on digital signal processing for hearing instruments
Dynamic precedence effect modeling for source separation in reverberant environments
IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Solving demodulation as an optimization problem
IEEE Transactions on Audio, Speech, and Language Processing
A tandem algorithm for pitch estimation and voiced speech segregation
IEEE Transactions on Audio, Speech, and Language Processing
Improving speech intelligibility in noise using environment-optimized algorithms
IEEE Transactions on Audio, Speech, and Language Processing
A multistage approach to blind separation of convolutive speech mixtures
Speech Communication
Single-channel speech separation based on long-short frame associated harmonic model
Digital Signal Processing
Monaural voiced speech segregation based on dynamic harmonic function
EURASIP Journal on Audio, Speech, and Music Processing
BLUES from music: BLind underdetermined extraction of sources from music
ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Separating underdetermined convolutive speech mixtures
ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Perceptive, non-linear speech processing and spiking neural networks
Nonlinear Speech Modeling and Applications
Nonlinear Speech Modeling and Applications
Hi-index | 0.00 |
Segregating speech from one monaural recording has proven to be very challenging. Monaural segregation of voiced speech has been studied in previous systems that incorporate auditory scene analysis principles. A major problem for these systems is their inability to deal with the high-frequency part of speech. Psychoacoustic evidence suggests that different perceptual mechanisms are involved in handling resolved and unresolved harmonics. We propose a novel system for voiced speech segregation that segregates resolved and unresolved harmonics differently. For resolved harmonics, the system generates segments based on temporal continuity and cross-channel correlation, and groups them according to their periodicities. For unresolved harmonics, it generates segments based on common amplitude modulation (AM) in addition to temporal continuity and groups them according to AM rates. Underlying the segregation process is a pitch contour that is first estimated from speech segregated according to dominant pitch and then adjusted according to psychoacoustic constraints. Our system is systematically evaluated and compared with pervious systems, and it yields substantially better performance, especially for the high-frequency part of speech.