Mismatch modeling and compensation for robust speaker verification

Authors:
Yun Lei;John H. L. Hansen
Affiliations:
Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX, USA;Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX, USA
Venue:
Speech Communication
Year:
2011

Citing 7
Cited 0

Acoustical and environmental robustness in automatic speech recognition

Acoustical and environmental robustness in automatic speech recognition
Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Speech recognition in noisy environments using first-order vector Taylor series

Speech Communication
Analysis and compensation of stressed and noisy speech with application to robust automatic recognition

Analysis and compensation of stressed and noisy speech with application to robust automatic recognition
Speech recognition in noisy environments

Speech recognition in noisy environments
A Study of Interspeaker Variability in Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Joint Factor Analysis Versus Eigenchannels in Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this study, primary channel mismatch scenario between enrollment and test conditions in a speaker verification task are analyzed and modeled. A novel Gaussian mixture modeling with a universal background model (GMM-UBM) frame based compensation model related to the mismatch is formulated and evaluated using National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2008 data, along with a comparison to the well-known eigenchannel model. Proposed compensation method show significant improvement versus an eigenchannel model when only the supervector of the UBM is employed. Here, the supervector of the enrollment speaker model is not included for estimation of the mismatch since it is difficult to obtain the real supervector of the speaker based on the limited 5min, channel dependent speech data only. The proposed mismatch compensation model, therefore show that construction of the supervector obtained from a UBM model can more accurately describe the mismatch between enrollment and test data, resulting in effective classification performance improvement for speaker/speech applications.