Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Text-independent speaker recognition using graph matching
Pattern Recognition Letters
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
Effects of time lapse on speaker recognition results
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Making confident speaker verification decisions with minimal speech
IEEE Transactions on Audio, Speech, and Language Processing
A comparison of session variability compensation approaches for speaker verification
IEEE Transactions on Information Forensics and Security
Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification
Computer Speech and Language
Hi-index | 0.00 |
This article describes a general and powerful approach to modelling mismatch in speaker recognition by including an explicit session term in the Gaussian mixture speaker modelling framework. Under this approach, the Gaussian mixture model (GMM) that best represents the observations of a particular recording is the combination of the true speaker model with an additional session-dependent offset constrained to lie in a low-dimensional subspace representing session variability. A novel and efficient model training procedure is proposed in this work to perform the simultaneous optimisation of the speaker model and session variables required for speaker training. Using a similar iterative approach to the Gauss-Seidel method for solving linear systems, this procedure greatly reduces the memory and computational resources required by a direct solution. Extensive experimentation demonstrates that the explicit session modelling provides up to a 68% reduction in detection cost over a standard GMM-based system and significant improvements over a system utilising feature mapping, and is shown to be effective on the corpora of recent National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations, exhibiting different session mismatch conditions.