Structurally noise resistant classifier for multi-modal person verification

Authors:
Conrad Sanderson;Kuldip K. Paliwal
Affiliations:
IDIAP, Rue du Simplon 4, Martigny CH-1920, Switzerland and Signal Processing Laboratory, School of Microelectronic Engineering Griffith University, Nathan, Qld. 4111, Australia;Signal Processing Laboratory, School of Microelectronic Engineering Griffith University, Nathan, Qld. 4111, Australia
Venue:
Pattern Recognition Letters
Year:
2003

Citing 6
Cited 0

Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
The NIST speaker recognition evaluation - overview methodology, systems, results, perspective

Speech Communication - Speaker recognition and its commercial and forensic applications
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Robust speaker verification via fusion of speech and lip modalities

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 06
Fusion of face and speech data for person identity verification

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this letter we propose a piece-wise linear (PL) classifier for use as the decision stage in a two-modal verification system, comprised of a face and a speech expert. The classifier utilizes a fixed decision boundary that has been specifically designed to account for the effects of noisy audio conditions. Experimental results on the VidTIMIT database show that in clean conditions, the proposed classifier is outperformed by a traditional weighted summation decision stage (using both fixed and adaptive weights). Using white Gaussian noise to corrupt the audio data resulted in the PL classifier obtaining better performance than the fixed approach and similar performance to the adaptive approach. Using a more realistic noise type, namely "operations room" noise from the NOISEX-92 corpus, resulted in the PL classifier obtaining better performance than both the fixed and adaptive approaches. The better results in this case stem from the PL classifier not making a direct assumption about the type of noise that causes the mismatch between training and testing conditions (unlike the adaptive approach). Moreover, the PL classifier has the advantage of having a fixed (non-adaptive, thus simpler) structure.