A hierarchical framework for spectro-temporal feature extraction

Authors:
Martin Heckmann;Xavier Domont;Frank Joublin;Christian Goerick
Affiliations:
Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany;Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany and Technische Universität Darmstadt, Control Theory and Robotics Lab, D-64283 Darmstadt, Germany;Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany;Honda Research Institute Europe GmbH, D-63073 Offenbach am Main, Germany
Venue:
Speech Communication
Year:
2011

Citing 14
Cited 1

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Speech recognition by machines and humans

Speech Communication
Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Learning optimized features for hierarchical models of invariant object recognition

Neural Computation
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
Nonnegative features of spectro-temporal sounds for classification

Pattern Recognition Letters
Sparse spectrotemporal coding of sounds

EURASIP Journal on Applied Signal Processing
Combined speech enhancement and auditory modelling for robust distributed speech recognition

Speech Communication
Perceptual features for automatic speech recognition in noisy environments

Speech Communication
Combining auditory preprocessing and Bayesian estimation for robust formant tracking

IEEE Transactions on Audio, Speech, and Language Processing
Word recognition with a hierarchical neural network

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
Fast and robust fixed-point algorithms for independent component analysis

IEEE Transactions on Neural Networks

Effects of architecture choices on sparse coding in speech recognition

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a hierarchical framework for the extraction of spectro-temporal acoustic features. The design of the features targets higher robustness in dynamic environments. Motivated by the large gap between human and machine performance in such conditions we take inspirations from the organization of the mammalian auditory cortex in the design of our features. This includes the joint processing of spectral and temporal information, the organization in hierarchical layers, competition between coequal features, the use of high-dimensional sparse feature spaces, and the learning of the underlying receptive fields in a data-driven manner. Due to these properties we termed the features as hierarchical spectro-temporal (HIST) features. For the learning of the features at the first layer we use Independent Component Analysis (ICA). At the second layer of our feature hierarchy we apply Non-Negative Sparse Coding (NNSC) to obtain features spanning a larger frequency and time region. We investigate the contribution of the different subparts of this feature extraction process to the overall performance. This includes an analysis of the benefits of the hierarchical processing, the comparison of different feature extraction methods on the first layer, the evaluation of the feature competition, and the investigation of the influence of different receptive field sizes on the second layer. Additionally, we compare our features to MFCC and RASTA-PLP features in a continuous digit recognition task in noise. On a wideband dataset we constructed ourselves based on the Aurora-2 task, as well as on the actual Aurora-2 database. We show that a combination of the proposed HIST features and RASTA-PLP features yields significant improvements and that the proposed features carry complementary information to RASTA-PLP and MFCC features.