Variational conditional random fields for online speaker detection and tracking

  • Authors:
  • M. H. Moattar;M. M. Homayounpour

  • Affiliations:
  • Laboratory for Intelligent Sound and Speech Processing, Computer Engineering and Information Technology Dept., Amirkabir University of Technology, Tehran, Iran;Laboratory for Intelligent Sound and Speech Processing, Computer Engineering and Information Technology Dept., Amirkabir University of Technology, Tehran, Iran

  • Venue:
  • Speech Communication
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

There are many references that concern a specific aspect of speaker tracking. This paper focuses on the speaker modeling issue and proposes conditional random fields (CRF) for this purpose. CRF is a class of undirected graphical models for classifying sequential data. CRF has some interesting characteristics which have encouraged us to use this model in a speaker modeling and tracking task. The main concern of CRF model is its training. Known approaches for CRF training are prone to overfitting and unreliable convergence. To solve this problem, variational approaches are proposed in this paper. The main novelty of this paper is to adapt variational framework for CRF training. The resulted approach is evaluated on three different areas. First, the best CRF model configuration for speaker modeling is evaluated on text independent speaker verification. Next, the selected model is used in a speaker detection task, in which the models of the existing speakers in the conversation are known a priori. Then, the proposed CRF approach is compared with GMM in an online speaker tracking framework. The results show that the proposed CRF model is superior to GMM in speaker detection and tracking, due to its capability for sequence modeling and segmentation.