Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
Robustness to telephone handset distortion in speaker recognition by discriminative feature design
Speech Communication - Speaker recognition and its commercial and forensic applications
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective
Speech Communication - Speaker recognition and its commercial and forensic applications
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Enhancing Speaker Discrimination at the Feature Level
Speaker Classification I
α-Gaussian mixture modelling for speaker recognition
Pattern Recognition Letters
Parameter estimation for α-gmm based on maximum likelihood criterion
Neural Computation
Speaker recognition via nonlinear phonetic- and speaker-discriminative features
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Hi-index | 0.00 |
Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.