Audio-visual human recognition using semi-supervised spectral learning and hidden Markov models

Authors:
Wei Feng;Lei Xie;Jia Zeng;Zhi-Qiang Liu
Affiliations:
Media Computing Group, School of Creative Media, City University of Hong Kong, Hong Kong, China;School of Computer Science, Northwestern Polytechnical University, Xi'an, China;Department of Computer Science, Hong Kong Baptist University, Hong Kong, China;Media Computing Group, School of Creative Media, City University of Hong Kong, Hong Kong, China
Venue:
Journal of Visual Languages and Computing
Year:
2009

Citing 22
Cited 3

Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Coding, Analysis, Interpretation, and Recognition of Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose

IEEE Transactions on Pattern Analysis and Machine Intelligence
Facial Expression Decomposition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Face Recognition Using Laplacianfaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Acquiring Linear Subspaces for Face Recognition under Variable Lighting

IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph Embedding: A General Framework for Dimensionality Reduction

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Neighborhood Preserving Embedding

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Self-Validated and Spatially Coherent Clustering with Net-Structured MRF and Graph Cuts

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 04
A tutorial on spectral clustering

Statistics and Computing
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Eigenfeature Regularization and Extraction in Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Type-2 fuzzy Gaussian mixture models

Pattern Recognition
Continuous nonlinear dimensionality reduction by kernel eigenmaps

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Multi-stream articulator model with adaptive reliability measure for audio visual speech recognition

ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Type-2 fuzzy hidden Markov models and their application to speech recognition

IEEE Transactions on Fuzzy Systems
A Comparative Study of Local Matching Approach for Face Recognition

IEEE Transactions on Image Processing
Formulating Face Verification With Semidefinite Programming

IEEE Transactions on Image Processing
Region-Level Image Authentication Using Bayesian Structural Content Abstraction

IEEE Transactions on Image Processing

Human augmented cognition based on integration of visual and auditory information

PRICAI'10 Proceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence
Multicue graph mincut for image segmentation

ACCV'09 Proceedings of the 9th Asian conference on Computer Vision - Volume Part II
A new approach for inner-knuckle-print recognition

Journal of Visual Languages and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a multimodal system for reliable human identity recognition under variant conditions. Our system fuses the recognition of face and speech with a general probabilistic framework. For face recognition, we propose a new spectral learning algorithm, which considers not only the discriminative relations among the training data but also the generative models for each class. Due to the tedious cost of face labeling in practice, our spectral face learning utilizes a semi-supervised strategy. That is, only a small number of labeled faces are used in our training step, and the labels are optimally propagated to other unlabeled training faces. Besides requiring much less labeled data, our algorithm also enables a natural way to explicitly train an outlier model that approximately represents unauthorized faces. To boost the robustness of our system for human recognition under various environments, our face recognition is further complemented by a speaker identification agent. Specifically, this agent models the statistical variations of fixed-phrase speech using speaker-dependent word hidden Markov models. Experiments on benchmark databases validate the effectiveness of our face recognition and speaker identification agents, and demonstrate that the recognition accuracy can be apparently improved by integrating these two independent biometric sources together.