Text-independent speaker identification using temporal patterns

Authors:
Tobias Bocklet;Andreas Maier;Elmar Nöth
Affiliations:
University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany;University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany;University of Erlangen Nuremberg, Chair for Pattern Recognition, Erlangen, Germany
Venue:
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Year:
2007

Citing 1
Cited 2

Robust parallel speech recognition in multiple energy bands

PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition

Age Determination of Children in Preschool and Primary School Age with GMM-Based Supervectors and Support Vector Machines/Regression

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Analysis of Hypernasal Speech in Children with Cleft Lip and Palate

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present an approach for text-independent speaker recognition. As features we used Mel Frequency Cepstrum Coefficients (MFCCs) and Temporal Patterns (TRAPs). For each speaker we trained Gaussian Mixture Models (GMMs) with different numbers of densities. The used database was a 36 speakers database with very noisy close-talking recordings. For the training a Universal Background Model (UBM) is built by the EM-Algorithm and all available training data. This UBM is then used to create speaker-dependent models for each speaker. This can be done in two ways: Taking the UBM as an initial model for EM-Training or Maximum-A-Posteriori (MAP) adaptation. For the 36 speaker database the use of TRAPs instead of MFCCs leads to a frame-wise recognition improvement of 12.0%. The adaptation with MAP enhanced the recognition rate by another 14.2%.