Explicit modelling of session variability for speaker verification

Authors:
Robbie Vogt;Sridha Sridharan
Affiliations:
Speech and Audio Research Laboratory, Queensland University of Technology, 2 George Street, Brisbane, Australia;Speech and Audio Research Laboratory, Queensland University of Technology, 2 George Street, Brisbane, Australia
Venue:
Computer Speech and Language
Year:
2008

Citing 2
Cited 11

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing

Text-independent speaker recognition using graph matching

Pattern Recognition Letters
Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Pattern Recognition Letters
Minimising Speaker Verification Utterance Length through Confidence Based Early Verification Decisions

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Effects of time lapse on speaker recognition results

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Making confident speaker verification decisions with minimal speech

IEEE Transactions on Audio, Speech, and Language Processing
A comparison of session variability compensation approaches for speaker verification

IEEE Transactions on Information Forensics and Security
Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification

Computer Speech and Language
Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Pattern Recognition
Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification

Pattern Recognition Letters
Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article describes a general and powerful approach to modelling mismatch in speaker recognition by including an explicit session term in the Gaussian mixture speaker modelling framework. Under this approach, the Gaussian mixture model (GMM) that best represents the observations of a particular recording is the combination of the true speaker model with an additional session-dependent offset constrained to lie in a low-dimensional subspace representing session variability. A novel and efficient model training procedure is proposed in this work to perform the simultaneous optimisation of the speaker model and session variables required for speaker training. Using a similar iterative approach to the Gauss-Seidel method for solving linear systems, this procedure greatly reduces the memory and computational resources required by a direct solution. Extensive experimentation demonstrates that the explicit session modelling provides up to a 68% reduction in detection cost over a standard GMM-based system and significant improvements over a system utilising feature mapping, and is shown to be effective on the corpora of recent National Institute of Standards and Technology (NIST) Speaker Recognition Evaluations, exhibiting different session mismatch conditions.