A Singing Voice Synthesis System Based on Sinusoidal Modeling
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A maximum likelihood approach to single-channel source separation
The Journal of Machine Learning Research
Learning Spectral Clustering, With Application To Speech Separation
The Journal of Machine Learning Research
Transcription and Separation of Drum Signals From Polyphonic Music
IEEE Transactions on Audio, Speech, and Language Processing
Auditory Segmentation Based on Onset and Offset Analysis
IEEE Transactions on Audio, Speech, and Language Processing
A Multipitch Analyzer Based on Harmonic Temporal Structured Clustering
IEEE Transactions on Audio, Speech, and Language Processing
Discriminating Between Pitched Sources in Music Audio
IEEE Transactions on Audio, Speech, and Language Processing
Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model
IEEE Transactions on Audio, Speech, and Language Processing
Normalized Cuts for Predominant Melodic Source Separation
IEEE Transactions on Audio, Speech, and Language Processing
Joint Detection and Tracking of Time-Varying Harmonic Components: A Flexible Bayesian Approach
IEEE Transactions on Audio, Speech, and Language Processing
Single-Channel Speech Separation Using Soft Mask Filtering
IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling
IEEE Transactions on Audio, Speech, and Language Processing
Separation of speech from interfering sounds based on oscillatory correlation
IEEE Transactions on Neural Networks
Monaural speech segregation based on pitch tracking and amplitude modulation
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
Harmonic model is widely used in single-channel audio source separation. It has been proven effective in music source separation problem, where the harmonic peaks among the sources differ greatly from each other. However, in analyzing a speech signal, the short time window always introduces the harmonic overlapping in the frequency domain. In order to overcome the shortcoming, we propose a long-short frame associated harmonic (LSAH) model to separate two speech sources from a single-channel recording. The long frame can achieve high harmonic resolution, while the short frame can ensure the short time stationary feature of the speech signal. They are jointly used to improve the accuracy of the multi-pitch estimation. Autocorrelation method is adopted to estimate the prominent pitch with simplicity and accuracy. LSAH model and the prominent pitch are proposed to judge the state of the mixture and estimate the other pitch candidate. Our method can guarantee both the high harmonic resolution and the short time stationarity of the speech signal. Furthermore, it can separate some unvoiced segments from the mixture which cannot be handled by many of the existed methods. Experiments on 30 groups of mixtures show that the proposed algorithm outperforms the standard short time harmonic model in terms of both signal-to-noise ratio (SNR) and subjective listening quality.