A tempering approach for Itakura-Saito non-negative matrix factorization. With application to music transcription

Authors:
Nancy Bertin;Cedric Fevotte;Roland Badeau
Affiliations:
CNRS LTCI - TELECOM ParisTech (ENST), 46 rue Barrault 75634 Cedex 13, France;CNRS LTCI - TELECOM ParisTech (ENST), 46 rue Barrault 75634 Cedex 13, France;CNRS LTCI - TELECOM ParisTech (ENST), 46 rue Barrault 75634 Cedex 13, France
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 5

Informed source separation using latent components

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
Stability analysis of multiplicative update algorithms and application to nonnegative matrix factorization

IEEE Transactions on Neural Networks
Divergence-based vector quantization

Neural Computation
Algorithms for nonnegative matrix factorization with the β-divergence

Neural Computation
Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we are interested in non-negative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. Previous work has demonstrated the relevance of this cost function for the decomposition of audio power spectrograms. This is in particular due to its scale invariance, which makes it more robust to the wide dynamics of audio, a property which is not shared by other popular costs such as the Euclidean distance or the generalized Kulback-Leibler (KL) divergence. However, while the latter two cost functions are convex, the IS divergence is not, which makes it more prone to convergence to irrelevant local minima, as observed empirically. Thus, the aim of this paper is to propose a tempering scheme that favors convergence of IS-NMF to global minima. Our algorithm is based on NMF with the beta-divergence, where the shape parameter beta acts as a temperature parameter. Results on both synthetical and music data (in a transcription context) show the relevance of our approach.