Application of new qualitative voicing time-frequency features for speaker recognition

Authors:
Nidhal Ben Aloui;Hervé Glotin;Patrick Hebrard
Affiliations:
Université du Sud Toulon, Var Laboratoire LSIS, La Garde, France and DCNS, Division SIS, Toulon, France;Université du Sud Toulon, Var Laboratoire LSIS, La Garde, France;DCNS, Division SIS, Toulon, France
Venue:
ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Year:
2007

Citing 5
Cited 0

Maintaining knowledge about temporal intervals

Communications of the ACM
Multi-stream adaptive evidence combination for noise robust ASR

Speech Communication - Special issue on noise robust ASR
Speaker Identification Using Harmonic Structure of LP-residual Spectrum

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
An interval-based representation of temporal knowledge

IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents original and efficient Qualitative Time-Frequency (QTF) speech features for speaker recognition based on a med-term speech dynamics qualitative representation. For each frame of around 150ms, we estimate and binarize a suband voicing activity estimation of 6 frequency subands. We then derive the Allen temporal relations graph between these 6 time intervals. This set of temporal relations, estimated at each frame, feeds a neural network which is trained for speaker recognition. Experiments are conducted on fifty speakers (males and females) of a reference radio database ESTER (40 hours) with continuous speech. Our best model generates around 3% of frame class error, without using information of frame continuity, which is similar to state of the art. Moreover, our QTF generates a simple and light representation using only 15 integers for coding speaker identity.