Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Authors:
Tomi Kinnunen;Juhani Saastamoinen;Ville Hautamäki;Mikko Vinni;Pasi Fränti
Affiliations:
Speech and Image Processing Unit (SIPU), Department of Computer Science and Statistics, University of Joensuu, P.O. Box 111, FI-80101 Joensuu, Finland;Speech and Image Processing Unit (SIPU), Department of Computer Science and Statistics, University of Joensuu, P.O. Box 111, FI-80101 Joensuu, Finland;Speech and Image Processing Unit (SIPU), Department of Computer Science and Statistics, University of Joensuu, P.O. Box 111, FI-80101 Joensuu, Finland;Speech and Image Processing Unit (SIPU), Department of Computer Science and Statistics, University of Joensuu, P.O. Box 111, FI-80101 Joensuu, Finland;Speech and Image Processing Unit (SIPU), Department of Computer Science and Statistics, University of Joensuu, P.O. Box 111, FI-80101 Joensuu, Finland
Venue:
Pattern Recognition Letters
Year:
2009

Citing 3
Cited 4

Explicit modelling of session variability for speaker verification

Computer Speech and Language
Fusion of acoustic and tokenization features for speaker recognition

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing

An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Computers and Electrical Engineering
Comparison of clustering methods: A case study of text-independent speaker modeling

Pattern Recognition Letters
Investigation of the effect of data duration and speaker gender on text-independent speaker recognition

Computers and Electrical Engineering

Quantified Score

Hi-index	0.10

Visualization

Abstract

Gaussian mixture model with universal background model (GMM-UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ-UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS-SVM) as a point of reference. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. The most interesting result, against a general belief, is that GMM-UBM yields better results for short segments whereas VQ-UBM is good for long utterances. The results also suggest that maximum likelihood training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.