Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

Authors:
N. Mesgarani;M. Slaney;S. A. Shamma
Affiliations:
Electr. & Comput. Eng. Dept., Univ. of Maryland, College Park, MD, USA;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 13

Exploiting Contextual Information for Speech/Non-Speech Detection

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Auditory Gist Perception: An Alternative to Attentional Selection of Auditory Streams?

Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Extraction of speech-relevant information from modulation spectrograms

Progress in nonlinear speech processing
Non-negative multilinear principal component analysis of auditory temporal modulations for music genre classification

IEEE Transactions on Audio, Speech, and Language Processing
Word recognition with a hierarchical neural network

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Robust feature extraction for speaker recognition based on constrained nonnegative tensor factorization

Journal of Computer Science and Technology
Discrimination of speech from nonspeeech in broadcast news based on modulation frequency features

Speech Communication
A hierarchical framework for spectro-temporal feature extraction

Speech Communication
Robust speech detection in real acoustic backgrounds with perceptually motivated features

Speech Communication
A New Truncation Strategy for the Higher-Order Singular Value Decomposition

SIAM Journal on Scientific Computing
A clustering based feature selection method in spectro-temporal domain for speech recognition

Engineering Applications of Artificial Intelligence
A scale-rate filter selection method in the spectro-temporal domain for phoneme classification

Computers and Electrical Engineering
Noise-robust speech recognition through auditory feature detection and spike sequence decoding

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a content-based audio classification algorithm based on novel multiscale spectro-temporal modulation features inspired by a model of auditory cortical processing. The task explored is to discriminate speech from nonspeech consisting of animal vocalizations, music, and environmental sounds. Although this is a relatively easy task for humans, it is still difficult to automate well, especially in noisy and reverberant environments. The auditory model captures basic processes occurring from the early cochlear stages to the central cortical areas. The model generates a multidimensional spectro-temporal representation of the sound, which is then analyzed by a multilinear dimensionality reduction technique and classified by a support vector machine (SVM). Generalization of the system to signals in high level of additive noise and reverberation is evaluated and compared to two existing approaches (Scheirer and Slaney, 2002 and Kingsbury et al., 2002). The results demonstrate the advantages of the auditory model over the other two systems, especially at low signal-to-noise ratios (SNRs) and high reverberation.