Discrimination Power of Vocal Source and Vocal Tract Related Features for Speaker Segmentation

Authors:
Wai Nang Chan;Nengheng Zheng;Tan Lee
Affiliations:
Chinese Univ. of Hong Kong, Hong Kong;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 6

Unsupervised speaker segmentation with residual phase and MFCC features

Expert Systems with Applications: An International Journal
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
BIC-based speaker segmentation using divide-and-conquer strategies with application to speaker diarization

IEEE Transactions on Audio, Speech, and Language Processing
Speaker verification using excitation source information

International Journal of Speech Technology
Exploration of phase and vocal excitation modulation features for speaker recognition

CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
Phonetic feature extraction for context-sensitive glottal source processing

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an analysis of the speaker discrimination power of vocal source related features, in comparison to the conventional vocal tract related features. The vocal source features, named wavelet octave coefficients of residues (WOCOR), are extracted by pitch-synchronous wavelet transform of the linear predictive (LP) residual signals. Using a series of controlled experiments, it is shown that WOCOR is less sensitive to spoken content than the conventional MFCC features and thus more discriminative when the amount of training data is limited. These advantages of WOCOR are exploited in the task of speaker segmentation for telephone conversation, in which statistical speaker models need to be built upon short speech segments. Experimental results show that the proposed use of WOCOR leads to noticeable reduction of segmentation errors.