VALID: a new practical audio-visual database, and comparative results

Authors:
Niall A. Fox;Brian A. O'Mullane;Richard B. Reilly
Affiliations:
Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland;Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland;Dept. of Electronic and Electrical Engineering, University College Dublin, Belfield, Dublin 4, Ireland
Venue:
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Year:
2005

Citing 8
Cited 3

The M2VTS Multimodal Face Database (Release 1.00)

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 04
An evaluation of visual speech features for the tasks of speech and speaker recognition

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
The BANCA database and evaluation protocol

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Audio-visual speaker identification based on the use of dynamic audio and visual features

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
BIOMET: a multimodal person authentication database including face, voice, fingerprint, hand and signature modalities

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
A review of speech-based bimodal recognition

IEEE Transactions on Multimedia

A Fuzzy Approach to Skin Color Detection

MICAI '08 Proceedings of the 7th Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Skin Color Segmentation Using Coarse-to-Fine Region on Normalized RGB Chromaticity Diagram for Face Detection

IEICE - Transactions on Information and Systems
Recognizability assessment of facial images for automated teller machine applications

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of deployed audio, face, and multi-modal person recognition systems in non-controlled scenarios, is typically lower than systems developed in highly controlled environments. With the aim to facilitate the development of robust audio, face, and multi-modal person recognition systems, the new large and realistic multi-modal (audio-visual) VALID database was acquired in a noisy “real world” office scenario with no control on illumination or acoustic noise. In this paper we describe the acquisition and content of the VALID database, consisting of five recording sessions of 106 subjects over a period of one month. Speaker identification experiments using visual speech features extracted from the mouth region are reported. The performance based on the uncontrolled VALID database is compared with that of the controlled XM2VTS database. The best VALID and XM2VTS based accuracies are 63.21% and 97.17% respectively. This highlights the degrading effect of an uncontrolled illumination environment and the importance of this database for deploying real world applications. The VALID database is available to the academic community through http://ee.ucd.ie/validdb/.