A comparison of session variability compensation approaches for speaker verification

Authors:
Mitchell McLaren;Robert Vogt;Brendan Baker;Sridha Sridharan
Affiliations:
Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
Venue:
IEEE Transactions on Information Forensics and Security
Year:
2010

Citing 6
Cited 0

A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Explicit modelling of session variability for speaker verification

Computer Speech and Language
Scatter Difference NAP for SVM Speaker Recognition

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing
SVM speaker verification using session variability modelling and GMM supervectors

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper compares two of the leading techniques for session variability compensation in the context of support vector machine (SVM) speaker verification using Gaussian mixture model (GMM) mean supervectors: joint factor analysis (JFA) modeling and nuisance attribute projection (NAP). Motivation for this comparison comes from the distinctly different domains in which these techniques are employed--the probabilistic GMM domain versus the discriminative SVM kernel. A theoretical analysis is given comparing the JFA and NAP approaches to variability compensation. The role of speaker factors in the factor analysis model is also contrasted against the scatter difference NAP objective of retaining speaker information in the SVM kernel space. These methods for retaining speaker variation are found to provide improved verification performance over the removal of channel effects alone. Overall, experimental results on the NIST 2006 and 2008 SRE corpora demonstrate the effectiveness of both JFA and NAP techniques for reducing the effects of variability. However, the overheads associated with the implementation of JFA may make NAP a more attractive technique due to its simple yet effective approach to variability compensation.