Improvement of speaker identification by combining prosodic features with acoustic features

  • Authors:
  • Rong Zheng;Shuwu Zhang;Bo Xu

  • Affiliations:
  • High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China;High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China;High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China

  • Venue:
  • SINOBIOMETRICS'04 Proceedings of the 5th Chinese conference on Advances in Biometric Person Authentication
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch Then new prosodic features are combined with MFCC parameters Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems.