Robust speaker verification with state duration modeling

Authors:
Nestor Becerra Yoma;Tarciano Facco Pegoraro
Affiliations:
Electrical Engineering Department, University of Chile, Av. Tupper 2007, P.O. Box 412-3, Santiago, Chile;Ericsson do Brasil, Rodovia Ermênio de Oliveira Penteado km 55,5, Idaiatuba, SP, Brazil
Venue:
Speech Communication
Year:
2002

Citing 2
Cited 2

Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
The Lombard effect: a reflex to better communicate with others in noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04

On the use of perceptual Line Spectral pairs Frequencies and higher-order residual moments for Speaker Identification

International Journal of Biometrics
Constrained temporal structure for text-dependent speaker verification

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of state duration modeling in the Viterbi algorithm in a text-dependent speaker verification task. The results presented in this paper suggest that temporal constraints can lead to reductions of 10% and 20% in the error rates with signals corrupted by noise at SNR equal to 6 and 0 dB, respectively, and that the accurate statistical modeling of state duration (e.g. with gamma probability distribution) does not seem to be very relevant if maximal and minimal state duration restrictions are imposed. In contrast, temporal restrictions do not seem to give any improvement in a speaker verification task with clean speech or high SNR. It is also shown that state duration constraints can easily be applied with the likelihood normalization metrics based on speaker-dependent temporal parameters. Finally, the results here presented show that word position-dependent state duration parameters give no significant improvement when compared with the word position-independent approach if the coarticulation effect between contiguous words is low.