Maintaining knowledge about temporal intervals
Communications of the ACM
Multi-stream adaptive evidence combination for noise robust ASR
Speech Communication - Special issue on noise robust ASR
Speaker Identification Using Harmonic Structure of LP-residual Spectrum
AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
A tutorial on text-independent speaker verification
EURASIP Journal on Applied Signal Processing
An interval-based representation of temporal knowledge
IJCAI'81 Proceedings of the 7th international joint conference on Artificial intelligence - Volume 1
Hi-index | 0.00 |
This paper presents original and efficient Qualitative Time-Frequency (QTF) speech features for speaker recognition based on a med-term speech dynamics qualitative representation. For each frame of around 150ms, we estimate and binarize a suband voicing activity estimation of 6 frequency subands. We then derive the Allen temporal relations graph between these 6 time intervals. This set of temporal relations, estimated at each frame, feeds a neural network which is trained for speaker recognition. Experiments are conducted on fifty speakers (males and females) of a reference radio database ESTER (40 hours) with continuous speech. Our best model generates around 3% of frame class error, without using information of frame continuity, which is similar to state of the art. Moreover, our QTF generates a simple and light representation using only 15 integers for coding speaker identity.