An investigation into subspace rapid speaker adaptation for verification

  • Authors:
  • S. Lucey;Tsuhan Chen

  • Affiliations:
  • Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA;Dept. of Electr. & Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA

  • Venue:
  • ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Rapid speaker adaptation is becoming more important in emerging applications where storage, computation and training utterances are at a premium (e.g. PDAs, cell phones). Effective adaptation can be achieved for the task of speaker verification, based on a maximum a posteriori (MAP) learning framework, by restricting the client's parametric model to be a linear combination of parameters estimated from training observations and a speaker independent "world" model (i.e. relevance adaptation (RA)). Subspace adaptation (SA) attempts to restrict a client's parametric representation to a pre-defined subspace during estimation. In this paper we elucidate where subspace adaptation outperforms world adaptation, demonstrate where and why subspace adaptation is sometimes not as effective and give insights into what cost criteria should be used to construct the adaptation parametric subspace. Results are presented on the acoustic portion of the XM2VTS database for the task of Gaussian mixture model (GMM) based text-independent speaker verification.