The LIMSI Broadcast News transcription system
Speech Communication - Special issue on automatic transcription of broadcast news data
A Comparison of Model Estimation Techniques for Speaker Verification
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Handset-Dependent Background Models for Robust Text-Independent Speaker Recognition
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Robust speaker identification in noisy environments using noise adaptive speaker models
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Improving a GMM speaker verification system by phonetic weighting
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Constrained temporal structure for text-dependent speaker verification
Digital Signal Processing
Hi-index | 0.00 |
The performance of speaker verification systems degrades considerably when the test segments are utterances of very short duration. This might be either due to variations in score-matching arising from the unobserved speech sounds of short speech utterances or the fact that the shorter the utterance, the greater the effect of individual speech sounds on the average likelihood score. In other words, the effects of individual speech sounds have not been cancelled out by a large number of speech sounds in very short utterances. This paper presents a score-based segment selection technique for discarding portions of speech that result in poor discrimination ability in a speaker verification task. Theory is developed to detect the most significant and reliable speech segments based on the probability that the test segment comes from a fixed set of cohort models. This approach, suitable for any duration of test utterance, reduces the effect of acoustic regions of the speech that are not accurately modelled due to sparse training data, and makes a decision based only on the segments that provide the best-matched scores from the segment selection algorithm. The proposed segment selection technique provides reductions in relative error rate of 22% and 7% in terms of minimum Detection Cost Function (DCF) and Equal Error Rate (EER) compared with a baseline used the segment-based normalization, when evaluated on the short utterances of NIST 2002 Speaker Recognition Evaluation dataset.