Constrained temporal structure for text-dependent speaker verification

Authors:
Anthony Larcher;Jean-Francois Bonastre;John S. D. Mason
Affiliations:
University of Avignon, LIA-CERI, 84911 Avignon Cedex 9, France;University of Avignon, LIA-CERI, 84911 Avignon Cedex 9, France;Speech and Image Research, School of Engineering, Swansea University, Swansea SA2 8PP, UK
Venue:
Digital Signal Processing
Year:
2013

Citing 15
Cited 0

Discriminating observation probability (DOP) HMM for speaker verification

Speech Communication
Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
Robust speaker verification with state duration modeling

Speech Communication
Phoneme Lattice Based A* Search Algorithm for Speech Recognition

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Connected word talker verification using whole word hidden Markov models

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Speaker identification via support vector classifiers

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Minimising Speaker Verification Utterance Length through Confidence Based Early Verification Decisions

ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
Constrained Viterbi decoding for embedded user-customised password speaker recognition

Proceedings of the 2010 ACM Symposium on Applied Computing
A segment selection technique for speaker verification

Speech Communication
RASTA-PLP speech analysis technique

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
The general use of tying in phoneme-based HMM speech recognisers

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
Dimension-Decoupled Gaussian Mixture Model for Short Utterance Speaker Recognition

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Concatenated phoneme models for text-variable speaker recognition

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Comparative evaluation of feature normalization techniques for speaker verification

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the context of mobile devices, speaker recognition engines may suffer from ergonomic constraints and limited amount of computing resources. Even if they prove their efficiency in classical contexts, GMM/UBM systems show their limitations when restricting the quantity of speech data. In contrast, the proposed GMM/UBM extension addresses situations characterised by limited enrolment data and only the computing power typically found on modern mobile devices. A key contribution comes from the harnessing of the temporal structure of speech using client-customised pass-phrases and new Markov model structures. Additional temporal information is then used to enhance discrimination with Viterbi decoding, increasing the gap between client and imposter scores. Experiments on the MyIdea database are presented with a standard GMM/UBM configuration acting as a benchmark. When imposters do not know the client pass-phrase, a relative gain of up to 65% in terms of EER is achieved over the GMM/UBM baseline configuration. The results clearly highlight the potential of this new approach, with a good balance between complexity and recognition accuracy.