Speaker and Session Variability in GMM-Based Speaker Verification

Authors:
P. Kenny;G. Boulianne;P. Ouellet;P. Dumouchel
Affiliations:
Centre de Recherche Informatique de Montreal, Que.;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 11

Text-independent speaker recognition using graph matching

Pattern Recognition Letters
The likelihood ratio decision criterion for nuisance attribute projection in GMM speaker verification

EURASIP Journal on Advances in Signal Processing
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Pattern Recognition
Fusion of discriminative and generative scoring criteria in GMM-based speaker verification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Applying emotional factor analysis and I-vector to emotional speaker recognition

CCBR'11 Proceedings of the 6th Chinese conference on Biometric recognition
Speaker identification using discriminative learning of large margin GMM

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Is masking a relevant aspect lacking in MFCC? A speaker verification perspective

Pattern Recognition Letters
Multitaper MFCC and PLP features for speaker verification using i-vectors

Speech Communication
Toward emotional speaker recognition: framework and preliminary results

CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Compensating for speaker or lexical variabilities in speech for emotion recognition

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a corpus-based approach to speaker verification in which maximum-likelihood II criteria are used to train a large-scale generative model of speaker and session variability which we call joint factor analysis. Enrolling a target speaker consists in calculating the posterior distribution of the hidden variables in the factor analysis model and verification tests are conducted using a new type of likelihood II ratio statistic. Using the NIST 1999 and 2000 speaker recognition evaluation data sets, we show that the effectiveness of this approach depends on the availability of a training corpus which is well matched with the evaluation set used for testing. Experiments on the NIST 1999 evaluation set using a mismatched corpus to train factor analysis models did not result in any improvement over standard methods, but we found that, even with this type of mismatch, feature warping performs extremely well in conjunction with the factor analysis model, and this enabled us to obtain very good results (equal error rates of about 6.2%)