Estimating the spatial position of spectral components in audio

Authors:
R. Mitchell Parry;Irfan Essa
Affiliations:
College of Computing / GVU Center, Georgia Institute of Technology, Atlanta, Georgia;College of Computing / GVU Center, Georgia Institute of Technology, Atlanta, Georgia
Venue:
ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Year:
2006

Citing 2
Cited 4

An information-maximization approach to blind separation and blind deconvolution

Neural Computation
Redundancy reduction for computational audition, a unifying approach

Redundancy reduction for computational audition, a unifying approach

Algorithms for sparse nonnegative tucker decompositions

Neural Computation
Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Sparse non-negative tensor factorization using columnwise coordinate descent

Pattern Recognition
Notes on nonnegative tensor factorization of the spectrogram for audio source separation: statistical insights and towards self-clustering of the spatial cues

CMMR'10 Proceedings of the 7th international conference on Exploring music contents

Quantified Score

Hi-index	0.00

Visualization

Abstract

One way of separating sources from a single mixture recording is by extracting spectral components and then combining them to form estimates of the sources. The grouping process remains a difficult problem. We propose, for instances when multiple mixture signals are available, clustering the components based on their relative contribution to each mixture (i.e., their spatial position). We introduce novel factorizations of magnitude spectrograms from multiple recordings and derive update rules that extend independent subspace analysis and non-negative matrix factorization to concurrently estimate the spectral shape, time envelope and spatial position of each component. We show that estimated component positions are near the position of their corresponding source, and that multichannel non-negative matrix factorization can distinguish three pianos by their position in the mixture.