Application of Multiple Classifier Techniques to Subband Speaker Identification with an HMM/ANN System

Authors:
Jonathan E. Higgins;Tony J. Dodd;Robert I. Damper
Affiliations:
-;-;-
Venue:
MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
Year:
2001

Citing 8
Cited 0

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Subband architecture for automatic speaker recognition

Signal Processing - Special issue on emerging techniques for communication terminals
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Linear Prediction of Speech

Linear Prediction of Speech
Subband Approach for Automatic Speaker Recognition: Optimal Division of the Frequency Domain

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Sub-Band Based Recognition of Noisy Speech

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In previous work, we have confirmed the performance gains that can be obtained in speaker recognition by splitting the (clean) wide-band speech signal into several subbands, employing separate pattern classifiers for each subband, and then using multiple classifier fusion ('recombination') techniques to produce a final decision. However, our earlier work used fairly rudimentary recognition techniques (dynamic time warping), just sum or product fusion rules and the spoken word seven only. The question then arises: Can subband processing still deliver performance gains when using state-of-the-art recognition techniques, more sophisticated recombination, and different spoken digits? To answer this, we have applied hidden Markov modelling and artificial neural network (ANN) recombination to text-dependent speaker identification, for spoken digits seven and nine. We find that ANN recombination performs about as well as the sum rule operating in log probability space, but the ANN results are not unique. They depend critically on user-specified parameters, initialisation, etc. On clean speech, all classifiers achieve close to 100% identification. Subband techniques offer advantages when the speech signal is significantly degraded by noise.