Combining dynamic texture and structural features for speaker identification

Authors:
Guoying Zhao;Xiaohua Huang;Yulia Gizatdinova;Matti Pietikäinen
Affiliations:
University of Oulu, Finland, Oulu, Finland;University of Oulu, Finland, Oulu, Finland;Tampere University, Finland, Tampere, Finland;University of Oulu, Finland, Oulu, Finland
Venue:
Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
Year:
2010

Citing 10
Cited 4

Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns

IEEE Transactions on Pattern Analysis and Machine Intelligence
Person identification using automatic integration of speech, lip, and face experts

WBMA '03 Proceedings of the 2003 ACM SIGMM workshop on Biometrics methods and applications
Feature-Based Detection of Facial Landmarks from Neutral and Expressive Facial Images

IEEE Transactions on Pattern Analysis and Machine Intelligence
A new lip feature representation method for video-based bimodal authentication

MMUI '05 Proceedings of the 2005 NICTA-HCSNet Multimodal User Interaction Workshop - Volume 57
2D Cascaded AdaBoost for Eye Localization

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Audio -Visual Biometric Based Speaker Identification

ICCIMA '07 Proceedings of the International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007) - Volume 04
Dynamic visual features for audio-visual speaker verification

Computer Speech and Language
Lipreading with local spatiotemporal descriptors

IEEE Transactions on Multimedia
Speaker and digit recognition by audio-visual lip biometrics

ICB'07 Proceedings of the 2007 international conference on Advances in Biometrics

Second ACM international workshop on multimedia in forensics, security and intelligence (MiFor 2010)

Proceedings of the international conference on Multimedia
Expression recognition in videos using a weighted component-based feature descriptor

SCIA'11 Proceedings of the 17th Scandinavian conference on Image analysis
Comparison of video-based pointing and selection techniques for hands-free text entry

Proceedings of the International Working Conference on Advanced Visual Interfaces
Towards a dynamic expression recognition system under facial occlusion

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Visual information from captured video is important for speaker identification under noisy conditions that have background noise or cross talk among speakers. In this paper, we propose local spatiotemporal descriptors to represent and recognize speakers based solely on visual features. Spatiotemporal dynamic texture features of local binary patterns extracted from localized mouth regions are used for describing motion information in utterances, which can capture the spatial and temporal transition characteristics. Structural edge map features are extracted from the image frames for representing appearance characteristics. Combination of dynamic texture and structural features takes both motion and appearance together into account, providing the description ability for spatiotemporal development in speech. In our experiments on BANCA and XM2VTS databases the proposed method obtained promising recognition results comparing to the other features.