Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

  • Authors:
  • N. Brummer;L. Burget;J. H. Cernocky;O. Glembek;F. Grezl;M. Karafiat;D. A. van Leeuwen;P. Matejka;P. Schwarz;A. Strasheim

  • Affiliations:
  • Specscom DataVoice, Stellenbosch;-;-;-;-;-;-;-;-;-

  • Venue:
  • IEEE Transactions on Audio, Speech, and Language Processing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants.