Exploration of phase and vocal excitation modulation features for speaker recognition

  • Authors:
  • Ning Wang;P. C. Ching;Tan Lee

  • Affiliations:
  • Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, P. R. China;Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, P. R. China;Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong, P. R. China

  • Venue:
  • CCBR'12 Proceedings of the 7th Chinese conference on Biometric Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mel-frequency cepstral coefficients (MFCCs) are found closely related to the linguistic content of speech. Besides cepstral features, there are resources in speech, e.g, the phase and excitation source, are believed to contain useful properties for speaker discrimination. Moreover, the magnitude-based features are insufficient to provide satisfactory and robust speaker recognition accuracy in real-world applications when large variations exist between the development and application scenarios. AM-FM signal modeling technique offers an effective approach to characterize and analyze speech properties. This work is therefore motivated to capture the relevant phase and vocal excitation related modulation features in complementing with MFCCs. In the context of multi-band demodulation analysis, we present a novel parameterization of speech and vocal excitation signal. A pertinent representation for most dominant primary frequencies present in the speech signal is first built. It is then applied to frames of the speech signal to derive effective speaker-discriminative features. The source-related amplitude and phase quantities are also parameterized into feature vectors. The application of the features is assessed in the context of a standard speaker identification and verification system. Complementary correlation between MFCCs and the modulation features is revealed by system fusion on score level.