Audio source separation with a signal-adaptive local cosine transform

Authors:
Andrew Nesbit;Mark D. Plumbley;Mike E. Davies
Affiliations:
Queen Mary, University of London, Centre for Digital Music, Department of Electronic Engineering, Mile End Road, London, E1 4NS, UK;Queen Mary, University of London, Centre for Digital Music, Department of Electronic Engineering, Mile End Road, London, E1 4NS, UK;University of Edinburgh, IDCOM & Joint Research Institute for Signal and Image Processing, King's Buildings, Mayfield Road, Edinburgh, EH9 3JL, UK
Venue:
Signal Processing
Year:
2007

Citing 6
Cited 0

Atomic Decomposition by Basis Pursuit

SIAM Review
Blind Source Separation by Sparse Decomposition in a Signal Dictionary

Neural Computation
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Matching pursuits with time-frequency dictionaries

IEEE Transactions on Signal Processing
Entropy-based algorithms for best basis selection

IEEE Transactions on Information Theory - Part 2
Sparse component analysis and blind source separation of underdetermined mixtures

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.08

Visualization

Abstract

Audio source separation is a very challenging problem, and many different approaches have been proposed in attempts to solve it. We consider the problem of separating sources from two-channel instantaneous audio mixtures. One approach to this is to transform the mixtures into the time-frequency domain to obtain approximately disjoint representations of the sources, and then separate the sources using time-frequency masking. We focus on demixing the sources by binary masking, and assume that the mixing parameters are known. In this paper, we investigate the application of cosine packet (CP) trees as a foundation for the transform. We determine an appropriate transform by applying a computationally efficient best basis algorithm to a set of possible local cosine bases organised in a tree structure. We develop a heuristically motivated cost function which maximises the energy of the transform coefficients associated with a particular source. Finally, we evaluate objectively our proposed transform method by comparing it against fixed-basis transforms such as the short-time Fourier transform (STFT) and modified discrete cosine transform (MDCT). Evaluation results indicate that our proposed transform method outperforms MDCT and is competitive with the STFT, and informal listening tests suggest that the proposed method exhibits less objectionable noise than the STFT.