Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification

  • Authors:
  • Charl Johannes Heerden;Etienne Barnard

  • Affiliations:
  • University of Pretoria, Pretoria Gauteng, South Africa and Human Language Technology Group, Meraka Institute, CSIR, Meiring Naude Rd, Brumeria, Pretoria Gauteng, South Africa;University of Pretoria, Pretoria Gauteng, South Africa and Human Language Technology Group, Meraka Institute, CSIR, Meiring Naude Rd, Brumeria, Pretoria Gauteng, South Africa

  • Venue:
  • Speaker Classification II
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a text-dependent speaker verification system based on Hidden Markov Models. A set of features, based on the temporal duration of context-dependent phonemes, is used in order to distinguish amongst speakers. Our approach was tested using the YOHO corpus; it was found that the HMM-based system achieved an equal error rate (EER) of 0.68% using conventional (acoustic) features and an EER of 0.32% when the time features were combined with the acoustic features. This compares well with state-of-the-art results on the same test, and shows the value of the temporal features for speaker verification. These features may also be useful for other purposes, such as the detection of replay attacks, or for improving the robustness of speaker-verification systems to channel or speaker variations. Our results confirm earlier findings obtained on text-independent speaker recognition [1] and text-dependent speaker verification [2] tasks, and contain a number of suggestions on further possible improvements.