Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise

Authors:
Taufiq Hasan;John H. L. Hansen
Affiliations:
Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA;Center for Robust Speech Syst. (CRSS), Univ. of Texas at Dallas, Richardson, TX, USA
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 13
Cited 0

Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Mixtures of probabilistic principal component analyzers

Neural Computation
Bayesian PCA

Proceedings of the 1998 conference on Advances in neural information processing systems II
Mixtures of Factor Analyzers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Speech Communication
Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments

IEEE Transactions on Audio, Speech, and Language Processing
Far-Field Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Front-End Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Whisper-Island Detection Based on Unsupervised Segmentation With Entropy-Based Speech Feature Processing

IEEE Transactions on Audio, Speech, and Language Processing
Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Speaker Identification Within Whispered Speech Audio Streams

IEEE Transactions on Audio, Speech, and Language Processing
A Study on Universal Background Model Training in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent speaker recognition/verification systems generally utilize an utterance dependent fixed dimensional vector as features to Bayesian classifiers. These vectors, known as i-Vectors, are lower dimensional representations of Gaussian Mixture Model (GMM) mean super-vectors adapted from a Universal Background Model (UBM) using speech utterance features, and extracted utilizing a Factor Analysis (FA) framework. This method is based on the assumption that the speaker dependent information resides in a lower dimensional sub-space. In this study, we utilize a mixture of Acoustic Factor Analyzers (AFA) to model the acoustic features instead of a GMM-UBM. Following our previously proposed AFA technique (“Acoustic factor analysis for robust speaker verification,” by Hasan and Hansen, IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 4, April 2013), this model is based on the assumption that the speaker relevant information lies in a lower dimensional subspace in the multi-dimensional feature space localized by the mixture components. Unlike our previous method, here we train the AFA-UBM model directly from the data using an Expectation-Maximization (EM) algorithm. This method shows improved robustness to noise as the nuisance dimensions are removed in each EM iteration. Two variants of the AFA model are considered utilizing an isotropic and diagonal covariance residual term. The method is integrated within a standard i-Vector system where the hidden variables of the model, termed as acoustic factors, are utilized as the input for total variability modeling. Experimental results obtained on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) core-extended trials indicate the effectiveness of the proposed strategy in both clean and noisy conditions.