Effects of architecture choices on sparse coding in speech recognition

  • Authors:
  • Fionntán O'Donnell;Fabian Triefenbach;Jean-Pierre Martens;Benjamin Schrauwen

  • Affiliations:
  • Ghent University, Department of Electronics and Information Systems, Sint-Pietersnieuwstraat, Ghent, Belgium;Ghent University, Department of Electronics and Information Systems, Sint-Pietersnieuwstraat, Ghent, Belgium;Ghent University, Department of Electronics and Information Systems, Sint-Pietersnieuwstraat, Ghent, Belgium;Ghent University, Department of Electronics and Information Systems, Sint-Pietersnieuwstraat, Ghent, Belgium

  • Venue:
  • ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common technique in visual object recognition is to use a sparse encoding of low-level input with a feature dictionary followed by a spatial pooling over local neighbourhoods. While some methods stack these in alternating layers within hierarchies, using these two stages alone can also produce state-of-the-art results. Following from vision, this framework is moving in to speech and audio processing tasks. We investigate the effect of architectural choices when applied to a spoken digit recognition task. We find that the unsupervised learning of features has a negligible effect on the classification, with the number of and size of the features being a greater determinant for recognition. Finally, we show that, given an optimised architecture, sparse coding performs comparably with Hidden Markov Models (HMMs) and outperforms K-means clustering.