Robust speaker modeling using perceptually motivated feature

  • Authors:
  • Waleed H. Abdulla

  • Affiliations:
  • Electrical and Computer Engineering Department, Private Bag 92019, The University of Auckland, New Zealand

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2007

Quantified Score

Hi-index 0.10

Visualization

Abstract

This paper introduces a novel method to extract robust features for text-independent speaker identification from short utterances. This method is perceptually motivated and inspired by the perceptual linear prediction (PLP) technique. The new feature is called perceptual log area ratio (PLAR). It is perceptual in the sense that it depends on notions from psychoacoustics where the robustness can be assured. Also, the log area ratio is an effective feature for recognizing speakers as it embodies the geometry and dynamics of the vocal tract, which are very much person-dependent. This research thus focuses on providing a reliable vocal biometric from speakers, which can be used effectively with full-band and telephone-band speech in noisy environments. Intensive performance analysis has been performed to benchmark the proposed method against the commonly-used features using different databases in different noisy environments. In almost all usable cases the PLAR proved its superiority over the commonly-used features such as MFCC and LPCC.