Text-independent speaker identification using temporal patterns

  • Authors:
  • Tobias Bocklet;Andreas Maier;Elmar Nöth

  • Affiliations:
  • University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany;University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany;University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany

  • Venue:
  • TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work we present an approach for text-independent speaker recognition. As features we used Mel Frequency Cepstrum Coefficients (MFCCs) and Temporal Patterns (TRAPs). For each speaker we trained Gaussian Mixture Models (GMMs) with different numbers of densities. The used database was a 36 speakers database with very noisy close-talking recordings. For the training a Universal Background Model (UBM) is built by the EM-Algorithm and all available training data. This UBM is then used to create speaker-dependent models for each speaker. This can be done in two ways: Taking the UBM as an initial model for EM-Training or Maximum-A-Posteriori (MAP) adaptation. For the 36 speaker database the use of TRAPs instead of MFCCs leads to a frame-wise recognition improvement of 12.0%. The adaptation with MAP enhanced the recognition rate by another 14.2%.