Single-channel speech separation based on long-short frame associated harmonic model

  • Authors:
  • Qinghua Huang;Dongmei Wang

  • Affiliations:
  • School of Communication and Information Engineering, Shanghai University, Shanghai 200072, PR China;School of Communication and Information Engineering, Shanghai University, Shanghai 200072, PR China

  • Venue:
  • Digital Signal Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Harmonic model is widely used in single-channel audio source separation. It has been proven effective in music source separation problem, where the harmonic peaks among the sources differ greatly from each other. However, in analyzing a speech signal, the short time window always introduces the harmonic overlapping in the frequency domain. In order to overcome the shortcoming, we propose a long-short frame associated harmonic (LSAH) model to separate two speech sources from a single-channel recording. The long frame can achieve high harmonic resolution, while the short frame can ensure the short time stationary feature of the speech signal. They are jointly used to improve the accuracy of the multi-pitch estimation. Autocorrelation method is adopted to estimate the prominent pitch with simplicity and accuracy. LSAH model and the prominent pitch are proposed to judge the state of the mixture and estimate the other pitch candidate. Our method can guarantee both the high harmonic resolution and the short time stationarity of the speech signal. Furthermore, it can separate some unvoiced segments from the mixture which cannot be handled by many of the existed methods. Experiments on 30 groups of mixtures show that the proposed algorithm outperforms the standard short time harmonic model in terms of both signal-to-noise ratio (SNR) and subjective listening quality.