PSO based optimized reliability for robust multimodal speaker identification

Authors:
Md. Tariquzzaman;Jin Young Kim;Seung You Na
Affiliations:
School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea;School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea;School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea
Venue:
CISST'10 Proceedings of the 4th WSEAS international conference on Circuits, systems, signal and telecommunications
Year:
2010

Citing 5
Cited 0

Dimension reduction by local principal component analysis

Neural Computation
An information theoretic framework for weight estimation in the combination of probabilistic classifiers for speaker identification

Speech Communication
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Noise adaptive stream weighting in audio-visual speech recognition

EURASIP Journal on Applied Signal Processing
An introduction to biometric recognition

IEEE Transactions on Circuits and Systems for Video Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function's optimizing factors to account optimum reliability simultaneously in audio, face and lip information. The proposed reliability measure is applied to a multimodal speaker identification framework for robust speaker identification. Particle swarm optimization (PSO) algorithm has been employed to optimize the modified convection function's optimizing factors. In the face-based expert, the image quality has been degraded with jpeg compression technique in enrollment and test session. Similarly, Lip-based expert's image quality also degraded to create mismatch in enrollment and test image. Finally, an artificial illumination in opposite direction has been added to test face and lip image with different intensities, respectively. The VidTimit audio DB was collected in office environment has a high level of signal distortion. We have applied local principal component analysis (Local PCA) to both face and lip modalities for reducing the dimension of features vector. The overall speaker identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimum reliability measures effectively enhanced the identification rate (IR) of 8.67% in comparison with the best classifier system i.e., audio classifier and most notably retained the consistency of multimodal integration framework.