Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning

Authors:
K. K. Yiu;M. W. Mak;S. Y. Kung
Affiliations:
Department of Electronic and Information Engineering, Center for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;Department of Electronic and Information Engineering, Center for Multimedia Signal Processing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong;Department of Electrical Engineering, Princeton University, United States
Venue:
Computer Speech and Language
Year:
2007

Citing 4
Cited 4

HTIMIT and LLHDB: Speech Corpora for the Study of Handset Transducer Effects

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Model Transformation for Robust Speaker Recognition from Telephone Data

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
A study on speaker adaptation of the parameters of continuousdensity hidden Markov models

IEEE Transactions on Signal Processing
Face recognition/detection by probabilistic decision-based neural network

IEEE Transactions on Neural Networks

Probabilistic feature-based transformation for speaker verification over telephone networks

Neurocomputing
Unsupervised intra-speaker variability compensation based on Gestalt and model adaptation in speaker verification with telephone speech

Speech Communication
A GMM-based robust incremental adaptation with a forgetting factor for speaker verification

ICIC'10 Proceedings of the Advanced intelligent computing theories and applications, and 6th international conference on Intelligent computing
Environmental robust speech and speaker recognition through multi-channel histogram equalization

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In speaker verification over public telephone networks, utterances can be obtained from different types of handsets. Different handsets may introduce different degrees of distortion to the speech signals. This paper attempts to combine a handset selector with (1) handset-specific transformations, (2) reinforced learning, and (3) stochastic feature transformation to reduce the effect caused by the acoustic distortion. Specifically, during training, the clean speaker models and background models are firstly transformed by MLLR-based handset-specific transformations using a small amount of distorted speech data. Then reinforced learning is applied to adapt the transformed models to handset-dependent speaker models and handset-dependent background models using stochastically transformed speaker patterns. During a verification session, a GMM-based handset classifier is used to identify the most likely handset used by the claimant; then the corresponding handset-dependent speaker and background model pairs are used for verification. Experimental results based on 150 speakers of the HTIMIT corpus show that environment adaptation based on the combination of MLLR, reinforced learning and feature transformation outperforms CMS, Hnorm, Tnorm, and speaker model synthesis.