Extension of Sparse, Adaptive Signal Decompositions to Semi-blind Audio Source Separation

Authors:
Andrew Nesbit;Emmanuel Vincent;Mark D. Plumbley
Affiliations:
School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom E1 4NS;METISS Group, IRISA-INRIA, Rennes Cedex, France 35042;School of Electronic Engineering and Computer Science, Queen Mary University of London, London, United Kingdom E1 4NS
Venue:
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
Year:
2009

Citing 4
Cited 1

Oracle estimators for the benchmarking of source separation algorithms

Signal Processing
Identifying Single Source Data for Mixing Matrix Estimation in Instantaneous Blind Source Separation

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Benchmarking flexible adaptive time-frequency transforms for underdetermined audio source separation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Best basis search in lapped dictionaries

IEEE Transactions on Signal Processing

A Uniform Framework for Ad-Hoc Indexes to Answer Reachability Queries on Large Graphs

DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We apply sparse, fast and flexible adaptive lapped orthogonal transforms to underdetermined audio source separation using the time-frequency masking framework. This normally requires the sources to overlap as little as possible in the time-frequency plane. In this work, we apply our adaptive transform schemes to the semi-blind case, in which the mixing system is already known, but the sources are unknown. By assuming that exactly two sources are active at each time-frequency index, we determine both the adaptive transforms and the estimated source coefficients using ***1 norm minimisation. We show average performance of 12---13 dB SDR on speech and music mixtures, and show that the adaptive transform scheme offers improvements in the order of several tenths of a dB over transforms with constant block length. Comparison with previously studied upper bounds suggests that the potential for future improvements is significant.