Audio source separation with a signal-adaptive local cosine transform

  • Authors:
  • Andrew Nesbit;Mark D. Plumbley;Mike E. Davies

  • Affiliations:
  • Queen Mary, University of London, Centre for Digital Music, Department of Electronic Engineering, Mile End Road, London, E1 4NS, UK;Queen Mary, University of London, Centre for Digital Music, Department of Electronic Engineering, Mile End Road, London, E1 4NS, UK;University of Edinburgh, IDCOM & Joint Research Institute for Signal and Image Processing, King's Buildings, Mayfield Road, Edinburgh, EH9 3JL, UK

  • Venue:
  • Signal Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.08

Visualization

Abstract

Audio source separation is a very challenging problem, and many different approaches have been proposed in attempts to solve it. We consider the problem of separating sources from two-channel instantaneous audio mixtures. One approach to this is to transform the mixtures into the time-frequency domain to obtain approximately disjoint representations of the sources, and then separate the sources using time-frequency masking. We focus on demixing the sources by binary masking, and assume that the mixing parameters are known. In this paper, we investigate the application of cosine packet (CP) trees as a foundation for the transform. We determine an appropriate transform by applying a computationally efficient best basis algorithm to a set of possible local cosine bases organised in a tree structure. We develop a heuristically motivated cost function which maximises the energy of the transform coefficients associated with a particular source. Finally, we evaluate objectively our proposed transform method by comparing it against fixed-basis transforms such as the short-time Fourier transform (STFT) and modified discrete cosine transform (MDCT). Evaluation results indicate that our proposed transform method outperforms MDCT and is competitive with the STFT, and informal listening tests suggest that the proposed method exhibits less objectionable noise than the STFT.