A Study of Interspeaker Variability in Speaker Verification

Authors:
P. Kenny;P. Ouellet;N. Dehak;V. Gupta;P. Dumouchel
Affiliations:
Centre de Rech. Inf. de Montreal, Montreal, QC;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 23

Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Pattern Recognition Letters
Analysis of the Utility of Classical and Novel Speech Quality Measures for Speaker Verification

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Impact of Prior Channel Information for Speaker Identification

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Support Vector Machine Regression for Robust Speaker Verification in Mismatching and Forensic Conditions

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
The subspace Gaussian mixture model-A structured model for speech recognition

Computer Speech and Language
Mismatch modeling and compensation for robust speaker verification

Speech Communication
A comparison of session variability compensation approaches for speaker verification

IEEE Transactions on Information Forensics and Security
Multimodal speaker verification based on electroglottograph signal and glottal activity detection

EURASIP Journal on Advances in Signal Processing
On the results of the first mobile biometry (MOBIO) face and speaker verification evaluation

ICPR'10 Proceedings of the 20th International conference on Recognizing patterns in signals, speech, images, and videos
Modeling nuisance variabilities with factor analysis for GMM-based audio pattern classification

Computer Speech and Language
Robust speaker recognition in cross-channel condition based on Gaussian mixture model

Multimedia Tools and Applications
Detecting replay attacks from far-field recordings on speaker verification systems

BioID'11 Proceedings of the COST 2101 European conference on Biometrics and ID management
Application of speaker- and language identification state-of-the-art techniques for emotion recognition

Speech Communication
Comparison of clustering methods: A case study of text-independent speaker modeling

Pattern Recognition Letters
Fusion of discriminative and generative scoring criteria in GMM-based speaker verification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Joint factor analysis for robust speech recognition: [in Chinese]

ROCLING '11 Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing
Orthogonal subspace combination based on the joint factor analysis for text-independent speaker recognition

CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Pertinent Prosodic Features for Speaker Identification by Voice

International Journal of Mobile Computing and Multimedia Communications
Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification

Pattern Recognition Letters
I-vector based speaker recognition using advanced channel compensation techniques

Computer Speech and Language
Vocal fatigue induced by prolonged oral reading: Analysis and detection

Computer Speech and Language
Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new approach to the problem of estimating the hyperparameters which define the interspeaker variability model in joint factor analysis. We tested the proposed estimation technique on the NIST 2006 speaker recognition evaluation data and obtained 10%-15% reductions in error rates on the core condition and the extended data condition (as measured both by equal error rates and the NIST detection cost function). We show that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types. (The comparisons are based on the best results on these tasks that have been reported in the literature.) In the case of the cross-channel condition, a factor analysis model with 300 speaker factors and 200 channel factors can achieve equal error rates of less than 3.0%. This is a substantial improvement over the best results that have previously been reported on this task.