Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Authors:
Cemal Hanilçi;Figen Ertaş
Affiliations:
Electronic Engineering Department, Uludag University, Bursa, Turkey;Electronic Engineering Department, Uludag University, Bursa, Turkey
Venue:
Computers and Electrical Engineering
Year:
2011

Citing 10
Cited 4

An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Rapid and brief communication: Combining classifier decisions for robust speaker identification

Pattern Recognition
Modeling prosodic differences for speaker recognition

Speech Communication
Investigation on LP-residual representations for speaker identification

Pattern Recognition
Comparative evaluation of maximum a Posteriori vector quantization and gaussian mixture models in speaker verification

Pattern Recognition Letters
α-Gaussian mixture modelling for speaker recognition

Pattern Recognition Letters
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Real-time speaker identification and verification

IEEE Transactions on Audio, Speech, and Language Processing
Speaker verification with adaptive spectral subband centroids

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics

Comparison of clustering methods: A case study of text-independent speaker modeling

Pattern Recognition Letters
Voice activity detection algorithm using nonlinear spectral weights, hangover and hangbefore criteria

Computers and Electrical Engineering
Fractional Fourier transform based features for speaker recognition using support vector machine

Computers and Electrical Engineering
Investigation of the effect of data duration and speaker gender on text-independent speaker recognition

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper evaluates the impact of three special forms of the Minkowski metric (Euclidean, City Block, and Chebychev distances) on the performance of the conventional vector quantization (VQ) and Gaussian mixture model (GMM) based closed-set text-independent speaker recognition systems, in terms of recognition rate and confidence on decisions. For the VQ based system, evaluations are carried out using the two most common clustering algorithms, LBG and K-means, and it is revealed which clustering algorithm and distance pair should be used to exploit the best attribute of both to achieve the best recognition rate for a given codebook size. In the case of GMM based system, we introduce the metrics into the GMM using a concatenation of the LBG and K-means algorithms in estimating the initial mean vectors, to which the system performance is sensitive, and explore their impact on system performance. We also make comparison of results obtained from evaluations on clean speech (TIMIT) and telephone speech databases (NTIMIT and NIST2001) with the modern classifiers VQ-UBM and GMM-UBM. It is found that there are cases where conventional VQ based system outperforms the modern systems. Moreover, the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification).