Improving the characterization of the alternative hypothesis via minimum verification error training with applications to speaker verification

Authors:
Yi-Hsiang Chao;Wei-Ho Tsai;Hsin-Min Wang;Ruei-Chuan Chang
Affiliations:
Institute of Information Science, Academia Sinica, Taipei 115, Taiwan and Department of Computer Science, National Chiao Tung University, Hsinchu 30010, Taiwan;Department of Electronic Engineering & Graduate Institute of Computer and Communication Engineering, National Taipei University of Technology, Taipei 10608, Taiwan;Institute of Information Science, Academia Sinica, Taipei 115, Taiwan;Department of Computer Science, National Chiao Tung University, Hsinchu 30010, Taiwan
Venue:
Pattern Recognition
Year:
2009

Citing 9
Cited 0

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Pattern Recognition in Speech and Language Processing

Pattern Recognition in Speech and Language Processing
Introduction to Evolutionary Computing

Introduction to Evolutionary Computing
FGKA: a Fast Genetic K-means Clustering Algorithm

Proceedings of the 2004 ACM symposium on Applied computing
A Prototypes-Embedded Genetic K-means Algorithm

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Utterance verification of keyword strings using word-based minimum verification error (WB-MVE) training

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

IEEE Transactions on Audio, Speech, and Language Processing
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.01

Visualization

Abstract

Speaker verification is usually formulated as a statistical hypothesis testing problem and solved by a log-likelihood ratio (LLR) test. A speaker verification system's performance is highly dependent on modeling the target speaker's voice (the null hypothesis) and characterizing non-target speakers' voices (the alternative hypothesis). However, since the alternative hypothesis involves unknown impostors, it is usually difficult to characterize a priori. In this paper, we propose a framework to better characterize the alternative hypothesis with the goal of optimally distinguishing the target speaker from impostors. The proposed framework is built on a weighted arithmetic combination (WAC) or a weighted geometric combination (WGC) of useful information extracted from a set of pre-trained background models. The parameters associated with WAC or WGC are then optimized using two discriminative training methods, namely, the minimum verification error (MVE) training method and the proposed evolutionary MVE (EMVE) training method, such that both the false acceptance probability and the false rejection probability are minimized. Our experiment results show that the proposed framework outperforms conventional LLR-based approaches.