On sparse and low-rank matrix decomposition for singing voice separation

Authors:
Yi-Hsuan Yang
Affiliations:
Research Center for IT Innovation, Academia Sinica, Taipei, Taiwan Roc
Venue:
Proceedings of the 20th ACM international conference on Multimedia
Year:
2012

Citing 10
Cited 0

Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
Singing voice detection in music tracks using direct voice vibrato detection

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

IEEE Transactions on Audio, Speech, and Language Processing
Image retagging

Proceedings of the international conference on Multimedia
A Singular Value Thresholding Algorithm for Matrix Completion

SIAM Journal on Optimization
Robust principal component analysis?

Journal of the ACM (JACM)
Music Emotion Recognition

Music Emotion Recognition
Music retagging using label propagation and robust principal component analysis

Proceedings of the 21st international conference companion on World Wide Web
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over recent years there has been a growing interest in finding ways to transform signals/matrices into sparse or low-rank representations, i.e., representations which are sparse in support or of low redundancy. Such decompositions are proving to be particularly powerful for a variety of signal processing and compression problems. In this paper, we investigate the application of this technique to the challenging task of singing voice/accompaniment separation for popular music. The vocal part is modeled as a sparse signal, whereas the instrumental part is considered to be low-rank. In addition, to better account for the particular properties of music, two new algorithms are proposed to improve the decomposition, including the incorporation of harmonicity priors and a back-end drum removal procedure. Evaluations on the MIR-1K benchmark dataset show that the proposed algorithms outperform the state-of-the-art by 0.01-2.41 db.