Single-channel speech separation based on long-short frame associated harmonic model

Authors:
Qinghua Huang;Dongmei Wang
Affiliations:
School of Communication and Information Engineering, Shanghai University, Shanghai 200072, PR China;School of Communication and Information Engineering, Shanghai University, Shanghai 200072, PR China
Venue:
Digital Signal Processing
Year:
2011

Citing 16
Cited 0

A Singing Voice Synthesis System Based on Sinusoidal Modeling

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A maximum likelihood approach to single-channel source separation

The Journal of Machine Learning Research
Learning Spectral Clustering, With Application To Speech Separation

The Journal of Machine Learning Research
Monaural speech segregation based on fusion of source-driven with model-driven techniques

Speech Communication
Speaker-independent model-based single channel speech separation

Neurocomputing
Transcription and Separation of Drum Signals From Polyphonic Music

IEEE Transactions on Audio, Speech, and Language Processing
Auditory Segmentation Based on Onset and Offset Analysis

IEEE Transactions on Audio, Speech, and Language Processing
A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering

IEEE Transactions on Audio, Speech, and Language Processing
Discriminating Between Pitched Sources in Music Audio

IEEE Transactions on Audio, Speech, and Language Processing
Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model

IEEE Transactions on Audio, Speech, and Language Processing
Normalized Cuts for Predominant Melodic Source Separation

IEEE Transactions on Audio, Speech, and Language Processing
Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach

IEEE Transactions on Audio, Speech, and Language Processing
Single-Channel Speech Separation Using Soft Mask Filtering

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling

IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation

IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Harmonic model is widely used in single-channel audio source separation. It has been proven effective in music source separation problem, where the harmonic peaks among the sources differ greatly from each other. However, in analyzing a speech signal, the short time window always introduces the harmonic overlapping in the frequency domain. In order to overcome the shortcoming, we propose a long-short frame associated harmonic (LSAH) model to separate two speech sources from a single-channel recording. The long frame can achieve high harmonic resolution, while the short frame can ensure the short time stationary feature of the speech signal. They are jointly used to improve the accuracy of the multi-pitch estimation. Autocorrelation method is adopted to estimate the prominent pitch with simplicity and accuracy. LSAH model and the prominent pitch are proposed to judge the state of the mixture and estimate the other pitch candidate. Our method can guarantee both the high harmonic resolution and the short time stationarity of the speech signal. Furthermore, it can separate some unvoiced segments from the mixture which cannot be handled by many of the existed methods. Experiments on 30 groups of mixtures show that the proposed algorithm outperforms the standard short time harmonic model in terms of both signal-to-noise ratio (SNR) and subjective listening quality.