Nonnegative features of spectro-temporal sounds for classification
Pattern Recognition Letters
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Kernel Codebooks for Scene Categorization
ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Evaluation of pooling operations in convolutional architectures for object recognition
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
A hierarchical framework for spectro-temporal feature extraction
Speech Communication
Scikit-learn: Machine Learning in Python
The Journal of Machine Learning Research
-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
IEEE Transactions on Signal Processing
Simple Method for High-Performance Digit Recognition Based on Sparse Coding
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
A common technique in visual object recognition is to use a sparse encoding of low-level input with a feature dictionary followed by a spatial pooling over local neighbourhoods. While some methods stack these in alternating layers within hierarchies, using these two stages alone can also produce state-of-the-art results. Following from vision, this framework is moving in to speech and audio processing tasks. We investigate the effect of architectural choices when applied to a spoken digit recognition task. We find that the unsupervised learning of features has a negligible effect on the classification, with the number of and size of the features being a greater determinant for recognition. Finally, we show that, given an optimised architecture, sparse coding performs comparably with Hidden Markov Models (HMMs) and outperforms K-means clustering.