Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns
IEEE Transactions on Pattern Analysis and Machine Intelligence
Person identification using automatic integration of speech, lip, and face experts
WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Feature-Based Detection of Facial Landmarks from Neutral and Expressive Facial Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
A new lip feature representation method for video-based bimodal authentication
MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
2D Cascaded AdaBoost for Eye Localization
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio -Visual Biometric Based Speaker Identification
ICCIMA '07 Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) - Volume 04
Dynamic visual features for audio-visual speaker verification
Computer Speech and Language
Lipreading with local spatiotemporal descriptors
IEEE Transactions on Multimedia
Speaker and digit recognition by audio-visual lip biometrics
ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics
Second ACM international workshop on multimedia in forensics, security and intelligence (MiFor 2010)
Proceedings of the international conference on Multimedia
Expression recognition in videos using a weighted component-based feature descriptor
SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Comparison of video-based pointing and selection techniques for hands-free text entry
Proceedings of the International Working Conference on Advanced Visual Interfaces
Towards a dynamic expression recognition system under facial occlusion
Pattern Recognition Letters
Hi-index | 0.00 |
Visual information from captured video is important for speaker identification under noisy conditions that have background noise or cross talk among speakers. In this paper, we propose local spatiotemporal descriptors to represent and recognize speakers based solely on visual features. Spatiotemporal dynamic texture features of local binary patterns extracted from localized mouth regions are used for describing motion information in utterances, which can capture the spatial and temporal transition characteristics. Structural edge map features are extracted from the image frames for representing appearance characteristics. Combination of dynamic texture and structural features takes both motion and appearance together into account, providing the description ability for spatiotemporal development in speech. In our experiments on BANCA and XM2VTS databases the proposed method obtained promising recognition results comparing to the other features.