In-set/out-of-set speaker recognition in sustained acoustic scenarios using sparse data

Authors:
John H. L. Hansen;Jun-Won Suh;Matthew R. Leonard
Affiliations:
-;-;-
Venue:
Speech Communication
Year:
2013

Citing 3
Cited 0

In-Set/Out-of-Set Speaker Recognition Under Sparse Enrollment

IEEE Transactions on Audio, Speech, and Language Processing
Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems

IEEE Transactions on Audio, Speech, and Language Processing
Discriminative In-Set/Out-of-Set Speaker Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study addresses the problem of identifying in-set versus out-of-set speakers in noise for limited train/test durations in situations where rapid detection and tracking is required. The objective is to form a decision as to whether the current input speaker is accepted as a member of an enrolled in-set group or rejected as an outside speaker. A new scoring algorithm that combines log likelihood scores across an energy-frequency grid is developed where high-energy speaker dependent frames are fused with weighted scores from low-energy noise dependent frames. By leveraging the balance between the speaker versus background noise environment, it is possible to realize an improvement in overall equal error rate performance. Using speakers from the TIMIT database with 5s of train and 2s of test, the average optimum relative EER performance improvement for the proposed full selective leveraging approach is +31.6%. The optimum relative EER performance improvement using 10s of NIST SRE-2008 is +10.8% using the proposed approach. The results confirm that for situations in which the background environment type remains constant between train and test, an in-set/out-of-set speaker recognition system that takes advantage of information gathered from the environmental noise can be formulated which realizes significant improvement when only extremely limited amounts of train/test data is available.