On the use of complementary spectral features for speaker recognition

Authors:
Danoush Hosseinzadeh;Sridhar Krishnan
Affiliations:
Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada;Department of Electrical and Computer Engineering, Ryerson University, Toronto, ON, Canada
Venue:
EURASIP Journal on Advances in Signal Processing
Year:
2008

Citing 2
Cited 4

Improving a GMM speaker verification system by phonetic weighting

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Gaussian Mixture Modeling of Short-Time Fourier Transform Features for Audio Fingerprinting

IEEE Transactions on Information Forensics and Security

Investigation of spectral centroid features for cognitive load classification

Speech Communication
Short Communication: Speaker recognition under limited data condition by noise addition

Expert Systems with Applications: An International Journal
Statistical analysis of complementary spectral features of emotional speech in Czech and Slovak

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Comparison of complementary spectral features of emotional speech for german, czech, and slovak

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most popular features for speaker recognition are Mel frequency cepstral coefficients (MFCCs) and linear prediction cepstral coefficients (LPCCs). These features are used extensively because they characterize the vocal tract configuration which is known to be highly speaker-dependent. In this work, several features are introduced that can characterize the vocal system in order to complement the traditional features and produce better speaker recognition models. The spectral centroid (SC), spectral bandwidth (SBW), spectral band energy (SBE), spectral crest factor (SCF), spectral flatness measure (SFM), Shannon entropy (SE), and Renyi entropy (RE) were utilized for this purpose. This work demonstrates that these features are robust in noisy conditions by simulating some common distortions that are found in the speakers' environment and a typical telephone channel. Babble noise, additive white Gaussian noise (AWGN), and a bandpass channel with 1 dB of ripple were used to simulate these noisy conditions. The results show significant improvements in classification performance for all noise conditions when these features were used to complement the MFCC and ΔMFCC features. In particular, the SC and SCF improved performance in almost all noise conditions within the examined SNR range (10-40 dB). For example, in cases where there was only one source of distortion, classification improvements of up to 8% and 10% were achieved under babble noise and AWGN, respectively, using the SCF feature.