Bioinspired sparse spectro-temporal representation of speech for robust classification

Authors:
C. Martínez;J. Goddard;D. Milone;H. Rufiner
Affiliations:
Centro de I+D en Señales, Sistemas e INteligencia Computacional (SINC(i)), Dpto. Informática, Facultad de Ingeniería, Universidad Nacional del Litoral, CC217, Ciudad Universitaria, ...;Dpto. de Ingeniería Eléctrica, UAM-Iztapalapa, Mexico;Centro de I+D en Señales, Sistemas e INteligencia Computacional (SINC(i)), Dpto. Informática, Facultad de Ingeniería, Universidad Nacional del Litoral, CC217, Ciudad Universitaria, ...;Centro de I+D en Señales, Sistemas e INteligencia Computacional (SINC(i)), Dpto. Informática, Facultad de Ingeniería, Universidad Nacional del Litoral, CC217, Ciudad Universitaria, ...
Venue:
Computer Speech and Language
Year:
2012

Citing 8
Cited 0

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Learning nonlinear overcomplete representations for efficient coding

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Atomic Decomposition by Basis Pursuit

SIAM Review
Sparse spectrotemporal coding of sounds

EURASIP Journal on Applied Signal Processing
Double sparsity: learning sparse dictionaries for sparse signal approximation

IEEE Transactions on Signal Processing
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Stable recovery of sparse overcomplete representations in the presence of noise

IEEE Transactions on Information Theory
Auditory representations of acoustic signals

IEEE Transactions on Information Theory - Part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, a first approach to a robust phoneme recognition task by means of a biologically inspired feature extraction method is presented. The proposed technique provides an approximation to the speech signal representation at the auditory cortical level. It is based on an optimal dictionary of atoms, estimated from auditory spectrograms, and the Matching Pursuit algorithm to approximate the cortical activations. This provides a sparse coding with intrinsic noise robustness, which can be therefore exploited when using the system in adverse environments. The recognition task consisted in the classification of a set of 5 easily confused English phonemes, in both clean and noisy conditions. Multilayer perceptrons were trained as classifiers and the performance was compared to other classic and robust parameterizations: the auditory spectrogram, a probabilistic optimum filtering on Mel frequency cepstral coefficients and the perceptual linear prediction coefficients. Results showed a significant improvement in the recognition rate of clean and noisy phonemes by the cortical representation over these other parameterizations.