Low-Latency instrument separation in polyphonic audio using timbre models

Authors:
Ricard Marxer;Jordi Janer;Jordi Bonada
Affiliations:
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain;Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Venue:
LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Year:
2012

Citing 10
Cited 0

A New Method of Interpolation and Smooth Curve Fitting Based on Local Procedures

Journal of the ACM (JACM)
Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 05
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Source/filter model for unsupervised main melody extraction from polyphonic audio signals

IEEE Transactions on Audio, Speech, and Language Processing
First stereo audio source separation evaluation campaign: data, algorithms and results

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Multiple fundamental frequency estimation and polyphony inference of polyphonic music signals

IEEE Transactions on Audio, Speech, and Language Processing
A general modular framework for audio source separation

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Blind separation of speech mixtures via time-frequency masking

IEEE Transactions on Signal Processing
Audio source separation with a single sensor

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This research focuses on the removal of the singing voice in polyphonic audio recordings under real-time constraints. It is based on time-frequency binary masks resulting from the combination of azimuth, phase difference and absolute frequency spectral bin classification and harmonic-derived masks. For the harmonic-derived masks, a pitch likelihood estimation technique based on Tikhonov regularization is proposed. A method for target instrument pitch tracking makes use of supervised timbre models. This approach runs in real-time on off-the-shelf computers with latency below 250ms. The method was compared to a state of the art Non-negative Matrix Factorization (NMF) offline technique and to the ideal binary mask separation. For the evaluation we used a dataset of multi-track versions of professional audio recordings.