A new adaptation approach to high-level speaker-model creation in speaker verification

Authors:
Shi-Xiong Zhang;Man-Wai Mak
Affiliations:
Center for Multimedia Signal Processing, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR;Center for Multimedia Signal Processing, Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR
Venue:
Speech Communication
Year:
2009

Citing 7
Cited 0

The contribution of prosodic boundary markers to the perceptual difference between read and spontaneous speech

Speech Communication
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
HTIMIT and LLHDB: Speech Corpora for the Study of Handset Transducer Effects

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Corpora for the evaluation of speaker recognition systems

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Probabilistic feature-based transformation for speaker verification over telephone networks

Neurocomputing
A new adaptation method for speaker-model creation in high-level speaker verification

PCM'07 Proceedings of the multimedia 8th Pacific Rim conference on Advances in multimedia information processing
Embedded kernel eigenvoice speaker adaptation and its implication to reference speaker weighting

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Research has shown that speaker verification based on high-level speaker features requires long enrollment utterances to guarantee low error rate during verification. However, in practical speaker verification, it is common to model speakers based on a limited amount of enrollment data, which will make the speaker models unreliable. This paper proposes four new adaptation methods for creating high-level speaker models to alleviate this undesirable effect. Unlike conventional methods in which only the phoneme-dependent background model is adapted, the proposed adaptation methods also adapts the phoneme-independent speaker model to fully utilize all the information available in the training data. A proportional factor, which is derived from the ratio between the phoneme-dependent background model and the phoneme-independent background model, is used to adjust the phoneme-independent speaker models during adaptation. The proposed method was evaluated under the NIST 2000 and NIST 2002 SRE frameworks. Experimental results show that the proposed adaptation method can alleviate the data-sparseness problem effectively and achieves a better performance when compared with traditional MAP adaptation.