Separation of Singing Voice From Music Accompaniment for Monaural Recordings

Authors:
Yipeng Li;DeLiang Wang
Affiliations:
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 8

Stereo audio source separation based on time--frequency masking and multilevel thresholding

Digital Signal Processing
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

IEEE Transactions on Audio, Speech, and Language Processing
Vocal melody extraction in the presence of pitched accompaniment in polyphonic music

IEEE Transactions on Audio, Speech, and Language Processing
Correlation-based amplitude estimation of coincident partials in monaural musical signals

EURASIP Journal on Audio, Speech, and Music Processing
Pattern induction and matching in music signals

CMMR'10 Proceedings of the 7th international conference on Exploring music contents
Singing speaker clustering based on subspace learning in the GMM mean supervector space

Speech Communication
Context-Aware features for singing voice detection in polyphonic music

AMR'11 Proceedings of the 9th international conference on Adaptive Multimedia Retrieval: large-scale multimedia retrieval and evaluation
Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Separating singing voice from music accompaniment is very useful in many applications, such as lyrics recognition and alignment, singer identification, and music information retrieval. Although speech separation has been extensively studied for decades, singing voice separation has been little investigated. We propose a system to separate singing voice from music accompaniment for monaural recordings. Our system consists of three stages. The singing voice detection stage partitions and classifies an input into vocal and nonvocal portions. For vocal portions, the predominant pitch detection stage detects the pitch of the singing voice and then the separation stage uses the detected pitch to group the time-frequency segments of the singing voice. Quantitative results show that the system performs the separation task successfully