Enhancing Speaker Discrimination at the Feature Level

Authors:
Jacques Koreman;Dalei Wu;Andrew C. Morris
Affiliations:
Department of Language and Communication Studies, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway;Department of Language and Communication Studies, Norwegian University of Science and Technology, NO-7491 Trondheim, Norway;SpinVox Ltd., Wethered House, Pound Lane, Marlow, Bucks, SL7 2AF, United Kingdom
Venue:
Speaker Classification I
Year:
2007

Citing 7
Cited 1

Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Robustness to telephone handset distortion in speaker recognition by discriminative feature design

Speech Communication - Speaker recognition and its commercial and forensic applications
Feature extraction using non-linear transformation for robust speech recognition on the Aurora database

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
How Is Individuality Expressed in Voice? An Introduction to Speech Production and Description for Speaker Classification

Speaker Classification I
Classification Methods for Speaker Recognition

Speaker Classification I
MLP internal representation as discriminative features for improved speaker recognition

NOLISP'05 Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing

Durations of Context-Dependent Phonemes: A New Feature in Speaker Verification

Speaker Classification II

Quantified Score

Hi-index	0.00

Visualization

Abstract

This chapter describes a method for enhancing the differences between speaker classes at the feature level (feature enhancement) in an automatic speaker recognition system. The original Mel-frequency cepstral coefficient (MFCC) space is projected onto a new feature space by a neural network trained on a subset of speakers which is representative for the whole target population. The new feature space better discriminates between the target classes (speakers) than the original feature space. The chapter focuses on the method for selecting a representative subset of speakers, comparing several approaches to speaker selection. The effect of feature enhancement is tested both for clean and various noisy speech types to evaluate its applicability under practical conditions. It is shown that the proposed method leads to a substantial improvement in speaker recognition performance. The method can also be applied to other automatic speaker classification tasks.