Robust parallel speech recognition in multiple energy bands
PR'05 Proceedings of the 27th DAGM conference on Pattern Recognition
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Analysis of Hypernasal Speech in Children with Cleft Lip and Palate
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Hi-index | 0.00 |
In this work we present an approach for text-independent speaker recognition. As features we used Mel Frequency Cepstrum Coefficients (MFCCs) and Temporal Patterns (TRAPs). For each speaker we trained Gaussian Mixture Models (GMMs) with different numbers of densities. The used database was a 36 speakers database with very noisy close-talking recordings. For the training a Universal Background Model (UBM) is built by the EM-Algorithm and all available training data. This UBM is then used to create speaker-dependent models for each speaker. This can be done in two ways: Taking the UBM as an initial model for EM-Training or Maximum-A-Posteriori (MAP) adaptation. For the 36 speaker database the use of TRAPs instead of MFCCs leads to a frame-wise recognition improvement of 12.0%. The adaptation with MAP enhanced the recognition rate by another 14.2%.