Singing Voice Enhancement in Monaural Music Signals Based on Two-stage Harmonic/Percussive Sound Separation on Multiple Resolution Spectrograms

Authors:
Hideyuki Tachibana;Nobutaka Ono;Shigeki Sagayama
Affiliations:
Grad. Sch. of Inf. Sci. & Technol., Univ. of Tokyo, Tokyo, Japan;Nat. Inst. of Inf., Tokyo, Japan;Nat. Inst. of Inf., Tokyo, Japan
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 11
Cited 0

Music information retrieval from a singing voice using lyrics and melody information

EURASIP Journal on Applied Signal Processing
On the improvement of singing voice separation for monaural recordings using the MIR-1K dataset

IEEE Transactions on Audio, Speech, and Language Processing
Source/filter model for unsupervised main melody extraction from polyphonic audio signals

IEEE Transactions on Audio, Speech, and Language Processing
Automatic recognition of lyrics in singing

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Vocal melody extraction in the presence of pitched accompaniment in polyphonic music

IEEE Transactions on Audio, Speech, and Language Processing
Performance measurement in blind audio source separation

IEEE Transactions on Audio, Speech, and Language Processing
Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model

IEEE Transactions on Audio, Speech, and Language Processing
Normalized Cuts for Predominant Melodic Source Separation

IEEE Transactions on Audio, Speech, and Language Processing
Separation of Singing Voice From Music Accompaniment for Monaural Recordings

IEEE Transactions on Audio, Speech, and Language Processing
Melody Transcription From Music Audio: Approaches and Evaluation

IEEE Transactions on Audio, Speech, and Language Processing
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel singing voice enhancement technique for monaural music audio signals, which is a quite challenging problem. Many singing voice enhancement techniques have been proposed recently. However, our approach is based on a quite different idea from these existing methods. We focused on the fluctuation of a singing voice and considered to detect it by exploiting two differently resolved spectrograms, one has rich temporal resolution and poor frequency resolution, while the other has rich frequency resolution and poor temporal resolution. On such two spectrograms, the shapes of fluctuating components are quite different. Based on this idea, we propose a singing voice enhancement technique that we call two-stage harmonic/percussive sound separation (HPSS). In this paper, we describe the details of two-stage HPSS and evaluate the performance of the method. The experimental results show that SDR, a commonly-used criterion on the task, was improved by around 4 dB, which is a considerably higher level than existing methods. In addition, we also evaluated the performance of the method as a preprocessing for melody estimation in music. The experimental results show that our singing voice enhancement technique considerably improved the performance of a simple pitch estimation technique. These results prove the effectiveness of the proposed method.