Speech Communication - Special issue on speech under stress
Mixtures of probabilistic principal component analyzers
Neural Computation
Proceedings of the 1998 conference on Advances in neural information processing systems II
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Front-End Factor Analysis for Speaker Verification
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
Joint Factor Analysis Versus Eigenchannels in Speaker Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Speaker Identification Within Whispered Speech Audio Streams
IEEE Transactions on Audio, Speech, and Language Processing
A Study on Universal Background Model Training in Speaker Verification
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Recent speaker recognition/verification systems generally utilize an utterance dependent fixed dimensional vector as features to Bayesian classifiers. These vectors, known as i-Vectors, are lower dimensional representations of Gaussian Mixture Model (GMM) mean super-vectors adapted from a Universal Background Model (UBM) using speech utterance features, and extracted utilizing a Factor Analysis (FA) framework. This method is based on the assumption that the speaker dependent information resides in a lower dimensional sub-space. In this study, we utilize a mixture of Acoustic Factor Analyzers (AFA) to model the acoustic features instead of a GMM-UBM. Following our previously proposed AFA technique (“Acoustic factor analysis for robust speaker verification,” by Hasan and Hansen, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, April 2013), this model is based on the assumption that the speaker relevant information lies in a lower dimensional subspace in the multi-dimensional feature space localized by the mixture components. Unlike our previous method, here we train the AFA-UBM model directly from the data using an Expectation-Maximization (EM) algorithm. This method shows improved robustness to noise as the nuisance dimensions are removed in each EM iteration. Two variants of the AFA model are considered utilizing an isotropic and diagonal covariance residual term. The method is integrated within a standard i-Vector system where the hidden variables of the model, termed as acoustic factors, are utilized as the input for total variability modeling. Experimental results obtained on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) core-extended trials indicate the effectiveness of the proposed strategy in both clean and noisy conditions.