PSO based optimized reliability for robust multimodal speaker identification

  • Authors:
  • Md. Tariquzzaman;Jin Young Kim;Seung You Na

  • Affiliations:
  • School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea;School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea;School of Electronics and Computer Engineering, Chonnam National University, Gwangju, South Korea

  • Venue:
  • CISST'10 Proceedings of the 4th WSEAS international conference on Circuits, systems, signal and telecommunications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function's optimizing factors to account optimum reliability simultaneously in audio, face and lip information. The proposed reliability measure is applied to a multimodal speaker identification framework for robust speaker identification. Particle swarm optimization (PSO) algorithm has been employed to optimize the modified convection function's optimizing factors. In the face-based expert, the image quality has been degraded with jpeg compression technique in enrollment and test session. Similarly, Lip-based expert's image quality also degraded to create mismatch in enrollment and test image. Finally, an artificial illumination in opposite direction has been added to test face and lip image with different intensities, respectively. The VidTimit audio DB was collected in office environment has a high level of signal distortion. We have applied local principal component analysis (Local PCA) to both face and lip modalities for reducing the dimension of features vector. The overall speaker identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimum reliability measures effectively enhanced the identification rate (IR) of 8.67% in comparison with the best classifier system i.e., audio classifier and most notably retained the consistency of multimodal integration framework.