Discriminative In-Set/Out-of-Set Speaker Recognition

Authors:
P. Angkititrakul;J. H. L. Hansen
Affiliations:
Center for Robust Speech Syst., Texas Univ., Richardson, TX;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 5

Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition

IEEE Transactions on Audio, Speech, and Language Processing
Detection of unknown speakers in an unsupervised speech controlled system

IWSDS'10 Proceedings of the Second international conference on Spoken dialogue systems for ambient environments
Short Communication: Speaker recognition under limited data condition by noise addition

Expert Systems with Applications: An International Journal
Self-learning speaker identification for enhanced speech recognition

Computer Speech and Language
In-set/out-of-set speaker recognition in sustained acoustic scenarios using sparse data

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The result- s confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes