An Introduction to Application-Independent Evaluation of Speaker Recognition Systems

Authors:
David A. Leeuwen;Niko Brümmer
Affiliations:
TNO Human Factors, Soesterberg, The Netherlands;Spescom DataVoice, Stellenbosch, South Africa
Venue:
Speaker Classification I
Year:
2007

Citing 2
Cited 4

The NIST speaker recognition evaluation - overview methodology, systems, results, perspective

Speech Communication - Speaker recognition and its commercial and forensic applications
Evaluations of Automatic Speaker Classification Systems

Speaker Classification I

Forensic Automatic Speaker Classification in the "Coming Paradigm Shift"

Speaker Classification I
Progress in the AMIDA Speaker Diarization System for Meeting Data

Multimodal Technologies for Perception of Humans
Optimizing automatic speech recognition for low-proficient non-native speakers

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM)

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the evaluation of speaker recognition systems--an important part of speaker classification [1], the trade-off between missed speakers and false alarms has always been an important diagnostic tool. NIST has defined the task of speaker detectionwith the associated Detection Cost Function(DCF) to evaluate performance, and introduced the DET-plot [2] as a diagnostic tool. Since the first evaluation in 1996, these evaluation tools have been embraced by the research community. Although it is an excellent measure, the DCF has the limitation that it has parameters that imply a particular applicationof the speaker detection technology.In this chapter we introduce an evaluation measure that instead averagesdetection performance over application types. This metric, , was first introduced in 2004 by one of the authors [3]. Here we introduce the subject with a minimum of mathematical detail, concentrating on the various interpretations of and its practical application.We will emphasize the difference between discriminationabilities of a speaker detector (`the position/shape of the DET-curve'), and the calibrationof the detector (`how well was the threshold set'). If speaker detectors can be built to output well-calibrated log-likelihood-ratio scores, such detectors can be said to have an application-independentcalibration. The proposed metric can properly evaluate the discrimination abilities of the log-likelihood-ratio scores, as well as the quality of the calibration.