Discovering speech phones using convolutive non-negative matrix factorisation with a sparseness constraint

Authors:
Paul D. O'Grady;Barak A. Pearlmutter
Affiliations:
Complex & Adaptive Systems Laboratory, University College Dublin, Belfield, Dublin 4, Ireland;Hamilton Institute, National University of Ireland Maynooth, Co. Kildare, Ireland
Venue:
Neurocomputing
Year:
2008

Citing 7
Cited 5

What is the goal of sensory coding?

Neural Computation
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
An information-maximization approach to blind separation and blind deconvolution

Neural Computation
A fast fixed-point algorithm for independent component analysis

Neural Computation
Csiszár’s divergences for non-negative matrix factorization: family of new algorithms

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

IEEE Transactions on Audio, Speech, and Language Processing
Convolutive Speech Bases and Their Application to Supervised Speech Separation

IEEE Transactions on Audio, Speech, and Language Processing

Unsupervised learning of time-frequency patches as a noise-robust representation of speech

Speech Communication
Algorithms for nonnegative matrix factorization with the β-divergence

Neural Computation
On connection between the convolutive and ordinary nonnegative matrix factorizations

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
An on-line NMF model for temporal pattern learning: theory with application to automatic speech recognition

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by non-negative matrix factorisation (NMF), a method for finding parts-based representations of non-negative data. Here, we present an extension to convolutive NMF that includes a sparseness constraint, where the resultant algorithm has multiplicative updates and utilises the beta divergence as its reconstruction objective. In combination with a spectral magnitude transform of speech, this method discovers auditory objects that resemble speech phones along with their associated sparse activation patterns. We use these in a supervised separation scheme for monophonic mixtures, finding improved separation performance in comparison to standard convolutive NMF.