Combining dynamic texture and structural features for speaker identification

  • Authors:
  • Guoying Zhao;Xiaohua Huang;Yulia Gizatdinova;Matti Pietikäinen

  • Affiliations:
  • University of Oulu, Finland, Oulu, Finland;University of Oulu, Finland, Oulu, Finland;Tampere University, Finland, Tampere, Finland;University of Oulu, Finland, Oulu, Finland

  • Venue:
  • Proceedings of the 2nd ACM workshop on Multimedia in forensics, security and intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Visual information from captured video is important for speaker identification under noisy conditions that have background noise or cross talk among speakers. In this paper, we propose local spatiotemporal descriptors to represent and recognize speakers based solely on visual features. Spatiotemporal dynamic texture features of local binary patterns extracted from localized mouth regions are used for describing motion information in utterances, which can capture the spatial and temporal transition characteristics. Structural edge map features are extracted from the image frames for representing appearance characteristics. Combination of dynamic texture and structural features takes both motion and appearance together into account, providing the description ability for spatiotemporal development in speech. In our experiments on BANCA and XM2VTS databases the proposed method obtained promising recognition results comparing to the other features.