Fundamentals of speech recognition
Fundamentals of speech recognition
Comparison of approximate methods for handling hyperparameters
Neural Computation
GaP: a factor model for discrete data
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Mixtures of Gamma Priors for Non-negative Matrix Factorization Based Speech Separation
ICA '09 Proceedings of the 8th International Conference on Independent Component Analysis and Signal Separation
IEEE Transactions on Audio, Speech, and Language Processing
Conjugate gamma Markov random fields for modelling nonstationary sources
ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Performance measurement in blind audio source separation
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
We introduce a new regularized nonnegative matrix factorization (NMF) method for supervised single-channel source separation (SCSS). We propose a new multi-objective cost function which includes the conventional divergence term for the NMF together with a prior likelihood term. The first term measures the divergence between the observed data and the multiplication of basis and gains matrices. The novel second term encourages the log-normalized gain vectors of the NMF solution to increase their likelihood under a prior Gaussian mixture model (GMM) which is used to encourage the gains to follow certain patterns. In this model, the parameters to be estimated are the basis vectors, the gain vectors and the parameters of the GMM prior. We introduce two different ways to train the model parameters, sequential training and joint training. In sequential training, after finding the basis and gains matrices, the gains matrix is then used to train the prior GMM in a separate step. In joint training, within each NMF iteration the basis matrix, the gains matrix and the prior GMM parameters are updated jointly using the proposed regularized NMF. The normalization of the gains makes the prior models energy independent, which is an advantage as compared to earlier proposals. In addition, GMM is a much richer prior than the previously considered alternatives such as conjugate priors which may not represent the distribution of the gains in the best possible way. In the separation stage after observing the mixed signal, we use the proposed regularized cost function with a combined basis and the GMM priors for all sources that were learned from training data for each source. Only the gain vectors are estimated from the mixed data by minimizing the joint cost function. We introduce novel update rules that solve the optimization problem efficiently for the new regularized NMF problem. This optimization is challenging due to using energy normalization and GMM for prior modeling, which makes the problem highly nonlinear and non-convex. The experimental results show that the introduced methods improve the performance of single channel source separation for speech separation and speech-music separation with different NMF divergence functions. The experimental results also show that, using the GMM prior gives better separation results than using the conjugate prior.