Robust speaker recognition in cross-channel condition based on Gaussian mixture model

Authors:
Yuxiang Shan;Jia Liu
Affiliations:
Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing, China 100084;Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing, China 100084
Venue:
Multimedia Tools and Applications
Year:
2011

Citing 4
Cited 0

Mixtures of probabilistic principal component analyzers

Neural Computation
Modeling Prosodic Features With Joint Factor Analysis for Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms

IEEE Transactions on Audio, Speech, and Language Processing
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most difficult challenges for speaker recognition is dealing with channel variability. In this paper, several new cross-channel compensation techniques are introduced for a Gaussian mixture model--universal background model (GMM-UBM) speaker verification system. These new techniques include wideband noise reduction, echo cancellation, a simplified feature-domain latent factor analysis (LFA) and data-driven score normalization. A novel dynamic Gaussian selection algorithm is developed to reduce the feature compensation time by more than 60% without any performance loss. The performance of different techniques across varying channel train/test conditions are presented and discussed, finding that speech enhancement, which used to be neglected for telephone speech, is essential for cross-channel tasks, and the channel compensation techniques developed for telephone channel speech also perform effectively. The per microphone performance analysis further shows that speech enhancement can boost the effects of other techniques greatly, especially on channels with larger signal-to-noise ratio (SNR) variance. All results are presented on NIST SRE 2006 and 2008 data, showing a promising performance gain compared to the baseline. The developed system is also compared with other state-of-the-art speaker verification systems. The result shows that the developed system can obtain comparable or even better performance but consumes much less CPU time, making it more suitable for practical use.