A segment selection technique for speaker verification

Authors:
Mohaddeseh Nosratighods;Eliathamby Ambikairajah;Julien Epps;Michael John Carey
Affiliations:
School of Electrical Engineering and Telecommunications, UNSW, Sydney, NSW 2052, Australia;School of Electrical Engineering and Telecommunications, UNSW, Sydney, NSW 2052, Australia and National ICT Australia (NICTA), Australian Technology Park, Eveleigh 1430, Australia;School of Electrical Engineering and Telecommunications, UNSW, Sydney, NSW 2052, Australia;Department of Electronic, Electrical and Computer Engineering, The University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
Venue:
Speech Communication
Year:
2010

Citing 5
Cited 1

The LIMSI Broadcast News transcription system

Speech Communication - Special issue on automatic transcription of broadcast news data
A Comparison of Model Estimation Techniques for Speaker Verification

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Handset-Dependent Background Models for Robust Text-Independent Speaker Recognition

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Robust speaker identification in noisy environments using noise adaptive speaker models

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Improving a GMM speaker verification system by phonetic weighting

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Constrained temporal structure for text-dependent speaker verification

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of speaker verification systems degrades considerably when the test segments are utterances of very short duration. This might be either due to variations in score-matching arising from the unobserved speech sounds of short speech utterances or the fact that the shorter the utterance, the greater the effect of individual speech sounds on the average likelihood score. In other words, the effects of individual speech sounds have not been cancelled out by a large number of speech sounds in very short utterances. This paper presents a score-based segment selection technique for discarding portions of speech that result in poor discrimination ability in a speaker verification task. Theory is developed to detect the most significant and reliable speech segments based on the probability that the test segment comes from a fixed set of cohort models. This approach, suitable for any duration of test utterance, reduces the effect of acoustic regions of the speech that are not accurately modelled due to sparse training data, and makes a decision based only on the segments that provide the best-matched scores from the segment selection algorithm. The proposed segment selection technique provides reductions in relative error rate of 22% and 7% in terms of minimum Detection Cost Function (DCF) and Equal Error Rate (EER) compared with a baseline used the segment-based normalization, when evaluated on the short utterances of NIST 2002 Speaker Recognition Evaluation dataset.