MLP internal representation as discriminative features for improved speaker recognition

Authors:
Dalei Wu;Andrew Morris;Jacques Koreman
Affiliations:
Institute of Phonetics, Saarland University, Saarbrücken, Germany;Institute of Phonetics, Saarland University, Saarbrücken, Germany;Institute of Phonetics, Saarland University, Saarbrücken, Germany
Venue:
NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing
Year:
2005

Citing 6
Cited 4

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Speech Communication - Speaker recognition and its commercial and forensic applications
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective

Speech Communication - Speaker recognition and its commercial and forensic applications
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Feature extraction using non-linear transformation for robust speech recognition on the Aurora database

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02

Enhancing Speaker Discrimination at the Feature Level

Speaker Classification I
α-Gaussian mixture modelling for speaker recognition

Pattern Recognition Letters
Parameter estimation for α-gmm based on maximum likelihood criterion

Neural Computation
Speaker recognition via nonlinear phonetic- and speaker-discriminative features

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.