Modelling auditory processing and organisation
Modelling auditory processing and organisation
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Proceedings of the First International Conference on Scale-Space Theory in Computer Vision
SCALE-SPACE '97 Proceedings of the First International Conference on Scale-Space Theory in Computer Vision
A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Prediction-driven computational auditory scene analysis
Prediction-driven computational auditory scene analysis
Sound source separation via computational auditory scene analysis (casa)-enhanced beamforming
Sound source separation via computational auditory scene analysis (casa)-enhanced beamforming
Neural Computation
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Sequential organization in computational auditory scene analysis
Sequential organization in computational auditory scene analysis
Auditory Segmentation Based on Onset and Offset Analysis
IEEE Transactions on Audio, Speech, and Language Processing
Model-based sequential organization in cochannel speech
IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation
IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Speech segregation, or the cocktail party problem, has proven to be an extremely challenging problem. This chapter describes a computational auditory scene analysis (CASA) approach to the cocktail party problem. This monaural approach performs auditory segmentation and grouping in a two-dimensional time-frequency representation that encodes proximity in frequency and time, periodicity, amplitude modulation, and onset/offset. In segmentation, our model decomposes the input mixture into contiguous time-frequency segments. Grouping is first performed for voiced speech where detected pitch contours are used to group voiced segments into a target stream and the background. In grouping voiced speech, resolved and unresolved harmonics are dealt with differently. Grouping of unvoiced segments is based on the Bayesian classification of acoustic-phonetic features. This CASA approach has led to major advances towards solving the cocktail party problem.