Dynamic visual features for audio-visual speaker verification

Authors:
David Dean;Sridha Sridharan
Affiliations:
Speech, Audio, Image and Video Research Laboratory, Queensland University of Technology, George St., Brisbane, Australia;Speech, Audio, Image and Video Research Laboratory, Queensland University of Technology, George St., Brisbane, Australia
Venue:
Computer Speech and Language
Year:
2010

Citing 14
Cited 1

The JPEG still picture compression standard

Communications of the ACM - Special issue on digital multimedia systems
Speechreading using probabilistic models

Computer Vision and Image Understanding - Special issue on physics-based modeling and reasoning in computer vision
Visual Speech: A Physiological or Behavioural Biometric?

AVBPA '01 Proceedings of the Third International Conference on Audio- and Video-Based Biometric Person Authentication
The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
A new lip feature representation method for video-based bimodal authentication

MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
Multimodal speaker/speech recognition using lip motion, lip texture and audio

Signal Processing - Special section: Multimodal human-computer interfaces
Audio-visual person authentication using lip-motion from orientation maps

Pattern Recognition Letters
Audio-visual speaker verification using continuous fused HMMs

VisHCI '06 Proceedings of the HCSNet workshop on Use of vision in human-computer interaction - Volume 56
Score normalization in multimodal biometric systems

Pattern Recognition
The BANCA database and evaluation protocol

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
A Bayesian approach to audio-visual speaker identification

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
A fused hidden Markov model with application to bimodal speech processing

IEEE Transactions on Signal Processing
A review of speech-based bimodal recognition

IEEE Transactions on Multimedia

Combining dynamic texture and structural features for speaker identification

Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio-visual speaker verification system over simpler utterance level score fusion.