Low-Latency instrument separation in polyphonic audio using timbre models

  • Authors:
  • Ricard Marxer;Jordi Janer;Jordi Bonada

  • Affiliations:
  • Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain

  • Venue:
  • LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This research focuses on the removal of the singing voice in polyphonic audio recordings under real-time constraints. It is based on time-frequency binary masks resulting from the combination of azimuth, phase difference and absolute frequency spectral bin classification and harmonic-derived masks. For the harmonic-derived masks, a pitch likelihood estimation technique based on Tikhonov regularization is proposed. A method for target instrument pitch tracking makes use of supervised timbre models. This approach runs in real-time on off-the-shelf computers with latency below 250ms. The method was compared to a state of the art Non-negative Matrix Factorization (NMF) offline technique and to the ideal binary mask separation. For the evaluation we used a dataset of multi-track versions of professional audio recordings.