Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling

Authors:
Zhiyao Duan;Yungang Zhang;Changshui Zhang;Zhenwei Shi
Affiliations:
Dept. of Autom., Tsinghua Univ., Beijing;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 7

Musical sound separation based on binary time-frequency masking

EURASIP Journal on Audio, Speech, and Music Processing
Multiple fundamental frequency estimation by modeling spectral peaks and non-peak regions

IEEE Transactions on Audio, Speech, and Language Processing
Vocal melody extraction in the presence of pitched accompaniment in polyphonic music

IEEE Transactions on Audio, Speech, and Language Processing
Correlation-based amplitude estimation of coincident partials in monaural musical signals

EURASIP Journal on Audio, Speech, and Music Processing
Single-channel speech separation based on long-short frame associated harmonic model

Digital Signal Processing
On sparse and low-rank matrix decomposition for singing voice separation

Proceedings of the 20th ACM international conference on Multimedia
Multi-pitch Streaming of Harmonic Sound Mixtures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Source separation of musical signals is an appealing but difficult problem, especially in the single-channel case. In this paper, an unsupervised single-channel music source separation algorithm based on average harmonic structure modeling is proposed. Under the assumption of playing in narrow pitch ranges, different harmonic instrumental sources in a piece of music often have different but stable harmonic structures; thus, sources can be characterized uniquely by harmonic structure models. Given the number of instrumental sources, the proposed algorithm learns these models directly from the mixed signal by clustering the harmonic structures extracted from different frames. The corresponding sources are then extracted from the mixed signal using the models. Experiments on several mixed signals, including synthesized instrumental sources, real instrumental sources, and singing voices, show that this algorithm outperforms the general nonnegative matrix factorization (NMF)-based source separation algorithm, and yields good subjective listening quality. As a side effect, this algorithm estimates the pitches of the harmonic instrumental sources. The number of concurrent sounds in each frame is also computed, which is a difficult task for general multipitch estimation (MPE) algorithms.