A nonlinear autoregressive model for speaker verification

  • Authors:
  • Sundararajan Srinivasan;Tao Ma;Georgios Lazarou;Joseph Picone

  • Affiliations:
  • Nuance Communications Inc., Sunnyvale, USA 94085;Apple Inc., Cupertino, USA 95014;The New York City Transit Authority, New York, USA 11103;Department of Electrical and Computer Engineering, Temple University, Philadelphia, USA 19027

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed.