Exploiting Contextual Information for Speech/Non-Speech Detection
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams?
Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Extraction of speech-relevant information from modulation spectrograms
Progress in nonlinear speech processing
IEEE Transactions on Audio, Speech, and Language Processing
Word recognition with a hierarchical neural network
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Journal of Computer Science and Technology
A hierarchical framework for spectro-temporal feature extraction
Speech Communication
A New Truncation Strategy for the Higher-Order Singular Value Decomposition
SIAM Journal on Scientific Computing
A clustering based feature selection method in spectro-temporal domain for speech recognition
Engineering Applications of Artificial Intelligence
A scale-rate filter selection method in the spectro-temporal domain for phoneme classification
Computers and Electrical Engineering
Hi-index | 0.00 |
We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.