Speaker recognition via nonlinear phonetic- and speaker-discriminative features

Authors:
Lara Stoll;Joe Frankel;Nikki Mirghafori
Affiliations:
International Computer Science Institute, Berkeley, CA and University of California at Berkeley, CA;International Computer Science Institute, Berkeley, CA and Centre for Speech Technology Research, Edinburgh, UK;International Computer Science Institute, Berkeley, CA
Venue:
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Year:
2007

Citing 3
Cited 0

Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Speech Communication - Speaker recognition and its commercial and forensic applications
THE SRI NIST 2008 speaker recognition evaluation system

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
MLP internal representation as discriminative features for improved speaker recognition

NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We use a multi-layer perceptron (MLP) to transform cepstral features into features better suited for speaker recognition. Two types of MLP output targets are considered: phones (Tandem/HATS-MLP) and speakers (Speaker-MLP). In the former case, output activations are used as features in a GMM speaker recognition system, while for the latter, hidden activations are used as features in an SVM system. Using a smaller set of MLP training speakers, chosen through clustering, yields system performance similar to that of a Speaker-MLP trained with many more speakers. For the NIST Speaker Recognition Evaluation 2004, both Tandem/HATS-GMM and Speaker-SVM systems improve upon a basic GMM baseline, but are unable to contribute in a score-level combination with a state-of-the-art GMM system. It may be that the application of normalizations and channel compensation techniques to the current state-of-the-art GMM has reduced channel mismatch errors to the point that contributions of the MLP systems are no longer additive.