Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

Authors:
N. Brummer;L. Burget;J. H. Cernocky;O. Glembek;F. Grezl;M. Karafiat;D. A. van Leeuwen;P. Matejka;P. Schwarz;A. Strasheim
Affiliations:
Specscom DataVoice, Stellenbosch;-;-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 11

The likelihood ratio decision criterion for nuisance attribute projection in GMM speaker verification

EURASIP Journal on Advances in Signal Processing
Analysis of the Utility of Classical and Novel Speech Quality Measures for Speaker Verification

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Support Vector Machine Regression for Robust Speaker Verification in Mismatching and Forensic Conditions

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems

The Journal of Machine Learning Research
Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition

IEEE Transactions on Audio, Speech, and Language Processing
Emotion recognition from speech by combining databases and fusion of classifiers

TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM)

Speech Communication
Quality-based conditional processing in multi-biometrics: application to sensor interoperability

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Application of speaker- and language identification state-of-the-art techniques for emotion recognition

Speech Communication
Fusion of discriminative and generative scoring criteria in GMM-based speaker verification

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants.