Analysis and compensation of Lombard speech across noise type and levels with application to in-set/out-of-set speaker recognition

Authors:
John H. L. Hansen;Vaishnevi Varadarajan
Affiliations:
Center for Robust Speech Systems, University of Texas at Dallas, Richardson, TX;Engine Systems Division, Caterpillar, Inc., Mossville, IL and Center for Robust Speech Systems, University of Texas at Dallas, Richardson, TX
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 4
Cited 5

Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Analysis and compensation of stressed and noisy speech with application to robust automatic recognition

Analysis and compensation of stressed and noisy speech with application to robust automatic recognition
Speech under stress conditions: overview of the effect on speech production and on system performance

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Discriminative In-Set/Out-of-Set Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Babble noise: modeling, analysis, and applications

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments

IEEE Transactions on Audio, Speech, and Language Processing
Singing speaker clustering based on subspace learning in the GMM mean supervector space

Speech Communication
An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech

Computer Speech and Language
Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech production in the presence of noise results in the Lombard Effect, which is known to have a serious impact on speech system performance. In this study, Lombard speech produced under different types and levels of noise is analyzed in terms of duration, energy histogram, and spectral tilt. Acoustic-phonetic differences are shown to exist between different "flavors" of Lombard speech based on analysis of trends from a Gaussian mixture model (GMM)-based Lombard speech type classifier. For the first time, the dependence of Lombard speech on noise type and noise level is established for the purposes of speech processing systems. Also, the impact of the different flavors of Lombard Effect on speech system performance is shown with respect to an in-set/out-of-set speaker recognition task. System performance is shown to degrade from an equal error rate (EER) of 7.0% under matched neutral training and testing conditions, to an average EER of 26.92% when trained with neutral and tested with Lombard Effect speech. Furthermore, improvement in the performance of in-set/out-of-set speaker recognition is demonstrated by adapting neutral speaker models with Lombard speech data of limited duration. Improved average EERs of 4.75% and 12.37% were achieved for matched and mismatched adaptation and testing conditions, respectively. At the highest noise levels, an EER as low as 1.78% was obtained by adapting neutral speaker models with Lombard speech of limited duration. The study therefore illustrates the impact of Lombard Effect on speaker recognition, and effective methods to improve system performance for speaker recognition when train/test conditions are mismatched for neutral versus Lombard Effect speech.