Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
IEEE Transactions on Pattern Analysis and Machine Intelligence
Subband architecture for automatic speaker recognition
Signal Processing - Special issue on emerging techniques for communication terminals
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
Linear Prediction of Speech
Subband Approach for Automatic Speaker Recognition: Optimal Division of the Frequency Domain
AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Sub-Band Based Recognition of Noisy Speech
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Hi-index | 0.00 |
In previous work, we have confirmed the performance gains that can be obtained in speaker recognition by splitting the (clean) wide-band speech signal into several subbands, employing separate pattern classifiers for each subband, and then using multiple classifier fusion ('recombination') techniques to produce a final decision. However, our earlier work used fairly rudimentary recognition techniques (dynamic time warping), just sum or product fusion rules and the spoken word seven only. The question then arises: Can subband processing still deliver performance gains when using state-of-the-art recognition techniques, more sophisticated recombination, and different spoken digits? To answer this, we have applied hidden Markov modelling and artificial neural network (ANN) recombination to text-dependent speaker identification, for spoken digits seven and nine. We find that ANN recombination performs about as well as the sum rule operating in log probability space, but the ANN results are not unique. They depend critically on user-specified parameters, initialisation, etc. On clean speech, all classifiers achieve close to 100% identification. Subband techniques offer advantages when the speech signal is significantly degraded by noise.