On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

Authors:
Chao-Ling Hsu;Jyh-Shing Roger Jang
Affiliations:
MediaTek-NTHU Joint Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan;MediaTek-NTHU Joint Lab, Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 3

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)

A theory and computational model of auditory monaural sound separation (stream, speech enhancement, selective attention, pitch perception, noise cancellation)
Automatic singer identification

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
Automatic Synchronization between Lyrics and Music CD Recordings Based on Viterbi Alignment of Segregated Vocal Signals

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Speech enhancement based on a priori signal to noise estimation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Separation of Singing Voice From Music Accompaniment for Monaural Recordings

IEEE Transactions on Audio, Speech, and Language Processing
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

IEEE Transactions on Audio, Speech, and Language Processing
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Vocal melody extraction in the presence of pitched accompaniment in polyphonic music

IEEE Transactions on Audio, Speech, and Language Processing
On sparse and low-rank matrix decomposition for singing voice separation

Proceedings of the 20th ACM international conference on Multimedia
Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Monaural singing voice separation is an extremely challenging problem. While efforts in pitch-based inference methods have led to considerable progress in voiced singing voice separation, little attention has been paid to the incapability of such methods to separate unvoiced singing voice due to its inharmonic structure and weaker energy. In this paper, we proposed a systematic approach to identify and separate the unvoiced singing voice from the music accompaniment. We have also enhanced the performance of separating voiced singing via a spectral subtraction method. The proposed system follows the framework of computational auditory scene analysis (CASA) which consists of the segmentation stage and the grouping stage. In the segmentation stage, the input song signals are decomposed into small sensory elements in different time-frequency resolutions. The unvoiced sensory elements are then identified by Gaussian mixture models. The experimental results demonstrated that the quality of the separated singing voice is improved for both the unvoiced and voiced parts. Moreover, to deal with the problem of lack of a publicly available dataset for singing voice separation, we have constructed a corpus called MIR-1K (Multimedia Information Retrieval lab, 1000 song clips) where all singing voices and music accompaniments were recorded separately. Each song clip comes with human-labeled pitch values, unvoiced sounds and vocal/nonvocal segments, and lyrics, as well as the speech recording of the lyrics.