Nonparametric Speaker Recognition Method Using Earth Mover's Distance

Authors:
Shingo Kuroiwa;Yoshiyuki Umeda;Satoru Tsuge;Fuji Ren
Affiliations:
The authors are with the Faculty of Engineering, the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: kuroiwa@is.tokushima-u.ac.jp;The authors are with the Faculty of Engineering, the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: kuroiwa@is.tokushima-u.ac.jp;The authors are with the Faculty of Engineering, the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: kuroiwa@is.tokushima-u.ac.jp;The authors are with the Faculty of Engineering, the University of Tokushima, Tokushima-shi, 770--8506 Japan. E-mail: kuroiwa@is.tokushima-u.ac.jp
Venue:
IEICE - Transactions on Information and Systems
Year:
2006

Citing 0
Cited 3

Speaker Verification in Realistic Noisy Environment in Forensic Science

IEICE - Transactions on Information and Systems
Evaluation of EMD-based speaker recognition using ISCSLP2006 chinese speaker recognition evaluation corpus

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Exploring similarity-based classification of larynx disorders from human voice

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a distributed speaker recognition method using a nonparametric speaker model and Earth Mover's Distance (EMD). In distributed speaker recognition, the quantized feature vectors are sent to a server. The Gaussian mixture model (GMM), the traditional method used for speaker recognition, is trained using the maximum likelihood approach. However, it is difficult to fit continuous density functions to quantized data. To overcome this problem, the proposed method represents each speaker model with a speaker-dependent VQ code histogram designed by registered feature vectors and directly calculates the distance between the histograms of speaker models and testing quantized feature vectors. To measure the distance between each speaker model and testing data, we use EMD which can calculate the distance between histograms with different bins. We conducted text-independent speaker identification experiments using the proposed method. Compared to results using the traditional GMM, the proposed method yielded relative error reductions of 32% for quantized data.