Improvement of speaker identification by combining prosodic features with acoustic features

Authors:
Rong Zheng;Shuwu Zhang;Bo Xu
Affiliations:
High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China;High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China;High Technology and Innovation Center, National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences, Beijing, China
Venue:
SINOBIOMETRICS'04 Proceedings of the 5th Chinese conference on Advances in Biometric Person Authentication
Year:
2004

Citing 2
Cited 0

A pitch determination and voiced/unvoiced decision algorithm for noisy speech

Speech Communication
Towards a robust/fast continuous speech recognition system using a voiced-unvoiced decision

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study prosodic features derived from pitch parameters to improve the performance of speaker identification (SID) system In order to deal with the problem of missing pitch in telephone speech, we use pitch estimation for each frame, even in unvoiced regions After silence frames removal, we also improve prosodic modeling by a weighting form of logarithm of pitch Then new prosodic features are combined with MFCC parameters Based on our Gaussian Mixture Model-Universal Background Model (GMM-UBM) recognizer, SID experiments are conducted on the NIST 2001 cellular telephone corpus Compared to MFCC features, combined features yield 7.0% relative error reduction for male and 2.5% for female We also discuss the advanced pitch extraction and modeling approach for the improvement of SID systems.