Unsupervised speaker segmentation with residual phase and MFCC features
Expert Systems with Applications: An International Journal
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
IEEE Transactions on Audio, Speech, and Language Processing
Speaker verification using excitation source information
International Journal of Speech Technology
Exploration of phase and vocal excitation modulation features for speaker recognition
CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Phonetic feature extraction for context-sensitive glottal source processing
Speech Communication
Hi-index | 0.00 |
This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.