Single channel music sound separation based on spectrogram decomposition and note classification

Authors:
Wenwu Wang;Hafiz Mustafa
Affiliations:
Centre for Vision, Speech and Signal Processing, University of Surrey, UK;Centre for Vision, Speech and Signal Processing, University of Surrey, UK
Venue:
CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Year:
2010

Citing 10
Cited 0

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Event formation and separation in musical sound

Event formation and separation in musical sound
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Using pitch, amplitude modulation, and spatial cues for separation of harmonic instruments from stereo music recordings

EURASIP Journal on Applied Signal Processing
Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis

Neural Computation
Monaural musical sound separation based on pitch and common amplitude modulation

IEEE Transactions on Audio, Speech, and Language Processing
A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance

IEEE Transactions on Signal Processing
Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

IEEE Transactions on Audio, Speech, and Language Processing
Separation of synchronous pitched notes by spectral filtering of harmonics

IEEE Transactions on Audio, Speech, and Language Processing
Two-Microphone Separation of Speech Mixtures

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Separating multiple music sources from a single channel mixture is a challenging problem. We present a new approach to this problem based on non-negative matrix factorization (NMF) and note classification, assuming that the instruments used to play the sound signals are known a priori. The spectrogram of the mixture signal is first decomposed into building components (musical notes) using an NMF algorithm. The Mel frequency cepstrum coefficients (MFCCs) of both the decomposed components and the signals in the training dataset are extracted. The mean squared errors (MSEs) between the MFCC feature space of the decomposed music component and those of the training signals are used as the similarity measures for the decomposed music notes. The notes are then labelled to the corresponding type of instruments by the K nearest neighbors (K-NN) classification algorithm based on the MSEs. Finally, the source signals are reconstructed from the classified notes and the weighting matrices obtained from the NMF algorithm. Simulations are provided to show the performance of the proposed system.