Comparison between supervised and unsupervised learning of probabilistic linear discriminant analysis mixture models for speaker verification

Authors:
Timur Pekhovsky;Aleksandr Sizov
Affiliations:
-;-
Venue:
Pattern Recognition Letters
Year:
2013

Citing 9
Cited 0

Mixtures of probabilistic principal component analyzers

Neural Computation
Explicit modelling of session variability for speaker verification

Computer Speech and Language
Support vector machines and Joint Factor Analysis for speaker verification

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Comparison of scoring methods used in speaker recognition with Joint Factor Analysis

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System

IEEE Transactions on Audio, Speech, and Language Processing
Front-End Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.10

Visualization

Abstract

We present a comparison of speaker verification systems based on unsupervised and supervised mixtures of probabilistic linear discriminant analysis (PLDA) models. This paper explores current applicability of unsupervised mixtures of PLDA models with Gaussian priors in a total variability space for speaker verification. Moreover, we analyze the experimental conditions under which this application is advantageous, taking into account the existing limitations of training database sizes, provided by the National Institute of Standards and Technology (NIST). We also present a full derivation of the Maximum Likelihood learning procedure for PLDA mixture. Experimental results for a cross-channel NIST Speaker Recognition Evaluation (SRE) 2010 verification task show that unsupervised PLDA mixture is more effective than other state-of-the-art methods. We show that for this task a combination of a homogeneous i-vector extractor and a mixture of two Gaussian PLDA models is more effective than a cross-channel i-vector extractor with a single Gaussian PLDA.