EURASIP Journal on Advances in Signal Processing
Analysis of the Utility of Classical and Novel Speech Quality Measures for Speaker Verification
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
The Journal of Machine Learning Research
Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition
IEEE Transactions on Audio, Speech, and Language Processing
Emotion recognition from speech by combining databases and fusion of classifiers
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Quality-based conditional processing in multi-biometrics: application to sensor interoperability
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Fusion of discriminative and generative scoring criteria in GMM-based speaker verification
TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Hi-index | 0.00 |
This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants.