Comparing maximum a posteriori vector quantization and Gaussian mixture models in speaker verification

Authors:
Tomi Kinnunen;Juhani Saastamoinen;Ville Hautamaki;Mikko Vinni;Pasi Franti
Affiliations:
Speech and Image Processing Unit (SIPU), Dept. of Computer Science and Statistics University of Joensuu, P.O. Box 111, FI-80101, FINLAND;Speech and Image Processing Unit (SIPU), Dept. of Computer Science and Statistics University of Joensuu, P.O. Box 111, FI-80101, FINLAND;Speech and Image Processing Unit (SIPU), Dept. of Computer Science and Statistics University of Joensuu, P.O. Box 111, FI-80101, FINLAND;Speech and Image Processing Unit (SIPU), Dept. of Computer Science and Statistics University of Joensuu, P.O. Box 111, FI-80101, FINLAND;Speech and Image Processing Unit (SIPU), Dept. of Computer Science and Statistics University of Joensuu, P.O. Box 111, FI-80101, FINLAND
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 3

Speaker recognition from encrypted VoIP communications

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Speaker verification from partially encrypted compressed speech for forensic investigation

Digital Investigation: The International Journal of Digital Forensics & Incident Response
A study of voice activity detection techniques for NIST speaker recognition evaluations

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gaussian mixture model - universal background model (GMM-UBM) is a standard reference classifier in speaker verification. We have recently proposed a simplified model using vector quantization (VQ-UBM). In this study, we extensively compare these two classifiers on NIST 2005, 2006 and 2008 SRE corpora, while having a standard discriminative classifier (GLDS-SVM) as a reference point. We focus on parameter setting for N-top scoring, model order, and performance for different amounts of training data. The most interesting result, against a general belief, is that GMM-UBM yields better results for short segments whereas VQ-UBM is good for long utterances. The results also suggest that maximum likelihood training of the UBM is sub-optimal, and hence, alternative ways to train the UBM should be considered.