GMM-SVM Kernel with a Bhattacharyya-based distance for speaker recognition

Authors:
Chang Huai You;Kong Aik Lee;Haizhou Li
Affiliations:
Human Language Technology, Institute for Infocomm Research, Agency for Science, Technology, and Research, Singapore;Human Language Technology, Institute for Infocomm Research, Agency for Science, Technology, and Research, Singapore;Human Language Technology, Institute for Infocomm Research, Agency for Science, Technology, and Research, Singapore
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 4
Cited 3

The Quality of Training Sample Estimates of the Bhattacharyya Coefficient

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
SVMTorch: support vector machines for large-scale regression problems

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis

Speaker verification in score-ageing-quality classification space

Computer Speech and Language
Diagnosis of depression by behavioural signals: a multimodal approach

Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Class-specific GMM based intermediate matching kernel for classification of varying length patterns of long duration speech using support vector machines

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

Among conventional methods for text-independent speaker recognition, Gaussian mixture model (GMM) is known for its effectiveness and scalability in modeling the spectral distribution of speech. A GMM-supervector characterizes a speaker's voice by the GMM parameters such as the mean vectors, covariance matrices and mixture weights. Besides the first-order statistics, it is generally believed that speaker's cues are partly conveyed by the second-order statistics. In this paper, we introduce a Bhattacharyya-based GMM-distance to measure the distance between two GMM distributions. Subsequently, the GMM-UBM mean interval (GUMI) concept is introduced to derive a GUMI kernel which can be used in conjunction with support vector machine (SVM) for speaker recognition. The GUMI kernel allows us to exploit the speaker's information not only from the mean vectors of GMM but also from the covariance matrices. Moreover, by analyzing the Bhattacharyya-based GMM-distance measure, we extend the Bhattacharyya-based kernel by involving both the mean and covariance statistical dissimilarities. We demonstrate the effectiveness of the new kernel on the National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2006 dataset.