Speech Communication - Special issue on speech processing in adverse conditions
Independent component analysis, a new concept?
Signal Processing - Special issue on higher order statistics
Speech recognition by machines and humans
Speech Communication
Speech Communication - Special issue on robust speech recognition
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
Nonnegative features of spectro-temporal sounds for classification
Pattern Recognition Letters
Sparse spectrotemporal coding of sounds
EURASIP Journal on Applied Signal Processing
Perceptual features for automatic speech recognition in noisy environments
Speech Communication
Combining auditory preprocessing and Bayesian estimation for robust formant tracking
IEEE Transactions on Audio, Speech, and Language Processing
Word recognition with a hierarchical neural network
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations
IEEE Transactions on Audio, Speech, and Language Processing
Fast and robust fixed-point algorithms for independent component analysis
IEEE Transactions on Neural Networks
Effects of architecture choices on sparse coding in speech recognition
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
Hi-index | 0.00 |
In this paper we present a hierarchical framework for the extraction of spectro-temporal acoustic features. The design of the features targets higher robustness in dynamic environments. Motivated by the large gap between human and machine performance in such conditions we take inspirations from the organization of the mammalian auditory cortex in the design of our features. This includes the joint processing of spectral and temporal information, the organization in hierarchical layers, competition between coequal features, the use of high-dimensional sparse feature spaces, and the learning of the underlying receptive fields in a data-driven manner. Due to these properties we termed the features as hierarchical spectro-temporal (HIST) features. For the learning of the features at the first layer we use Independent Component Analysis (ICA). At the second layer of our feature hierarchy we apply Non-Negative Sparse Coding (NNSC) to obtain features spanning a larger frequency and time region. We investigate the contribution of the different subparts of this feature extraction process to the overall performance. This includes an analysis of the benefits of the hierarchical processing, the comparison of different feature extraction methods on the first layer, the evaluation of the feature competition, and the investigation of the influence of different receptive field sizes on the second layer. Additionally, we compare our features to MFCC and RASTA-PLP features in a continuous digit recognition task in noise. On a wideband dataset we constructed ourselves based on the Aurora-2 task, as well as on the actual Aurora-2 database. We show that a combination of the proposed HIST features and RASTA-PLP features yields significant improvements and that the proposed features carry complementary information to RASTA-PLP and MFCC features.