Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint

  • Authors:
  • Paul D. O'Grady;Barak A. Pearlmutter

  • Affiliations:
  • Complex & Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland;Hamilton Institute, National University of Ireland Maynooth, Co. Kildare, Ireland

  • Venue:
  • Neurocomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by non-negative matrix factorisation (NMF), a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness constraint, where the resultant algorithm has multiplicative updates and utilises the beta divergence as its reconstruction objective. In combination with a spectral magnitude transform of speech, this method discovers auditory objects that resemble speech phones along with their associated sparse activation patterns. We use these in a supervised separation scheme for monophonic mixtures, finding improved separation performance in comparison to standard convolutive NMF.