Application of new qualitative voicing time-frequency features for speaker recognition

  • Authors:
  • Nidhal Ben Aloui;Hervé Glotin;Patrick Hebrard

  • Affiliations:
  • Université du Sud Toulon, Var Laboratoire LSIS, La Garde, France and DCNS, Division SIS, Toulon, France;Université du Sud Toulon, Var Laboratoire LSIS, La Garde, France;DCNS, Division SIS, Toulon, France

  • Venue:
  • ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents original and efficient Qualitative Time-Frequency (QTF) speech features for speaker recognition based on a med-term speech dynamics qualitative representation. For each frame of around 150ms, we estimate and binarize a suband voicing activity estimation of 6 frequency subands. We then derive the Allen temporal relations graph between these 6 time intervals. This set of temporal relations, estimated at each frame, feeds a neural network which is trained for speaker recognition. Experiments are conducted on fifty speakers (males and females) of a reference radio database ESTER (40 hours) with continuous speech. Our best model generates around 3% of frame class error, without using information of frame continuity, which is similar to state of the art. Moreover, our QTF generates a simple and light representation using only 15 integers for coding speaker identity.