Variational conditional random fields for online speaker detection and tracking

Authors:
M. H. Moattar;M. M. Homayounpour
Affiliations:
Laboratory for Intelligent Sound and Speech Processing, Computer Engineering and Information Technology Dept., Amirkabir University of Technology, Tehran, Iran;Laboratory for Intelligent Sound and Speech Processing, Computer Engineering and Information Technology Dept., Amirkabir University of Technology, Tehran, Iran
Venue:
Speech Communication
Year:
2012

Citing 20
Cited 0

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
An introduction to variational methods for graphical models

Learning in graphical models
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text

Bioinformatics
RNA secondary structural alignment with conditional random fields

Bioinformatics
Supervised classification using MCMC methods

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
Review: Speaker segmentation and clustering

Signal Processing
Variational Gaussian Mixture Models for Speech Emotion Recognition

ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Variational Bayesian Joint factor analysis for speaker verification

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
The I4U system in NIST 2008 speaker recognition evaluation

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
Variational Bayes Adapted GMM Based Models for Audio Clip Classification

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Efficiently inducing features of conditional random fields

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Segmentation conditional random fields (SCRFs): a new approach for protein fold recognition

RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
An overview of automatic speaker diarization systems

IEEE Transactions on Audio, Speech, and Language Processing
Variational learning for Gaussian mixture models

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Speaker Diarization: A Review of Recent Research

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are many references that concern a specific aspect of speaker tracking. This paper focuses on the speaker modeling issue and proposes conditional random fields (CRF) for this purpose. CRF is a class of undirected graphical models for classifying sequential data. CRF has some interesting characteristics which have encouraged us to use this model in a speaker modeling and tracking task. The main concern of CRF model is its training. Known approaches for CRF training are prone to overfitting and unreliable convergence. To solve this problem, variational approaches are proposed in this paper. The main novelty of this paper is to adapt variational framework for CRF training. The resulted approach is evaluated on three different areas. First, the best CRF model configuration for speaker modeling is evaluated on text independent speaker verification. Next, the selected model is used in a speaker detection task, in which the models of the existing speakers in the conversation are known a priori. Then, the proposed CRF approach is compared with GMM in an online speaker tracking framework. The results show that the proposed CRF model is superior to GMM in speaker detection and tracking, due to its capability for sequence modeling and segmentation.