I-vector based speaker recognition using advanced channel compensation techniques

Authors:
Ahilan Kanagasundaram;David Dean;Sridha Sridharan;Mitchell Mclaren;Robbie Vogt
Affiliations:
-;-;-;-;-
Venue:
Computer Speech and Language
Year:
2014

Citing 7
Cited 0

Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria

IEEE Transactions on Pattern Analysis and Machine Intelligence
Uncorrelated linear discriminant analysis based on weighted pairwise Fisher criterion

Pattern Recognition
Scatter Difference NAP for SVM Speaker Recognition

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Maximum margin criterion with tensor representation

Neurocomputing
Front-End Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA) and (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, SN-WLDA, for NIST 2008 interview/telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification.