Speaker identification and verification using Gaussian mixture speaker models
Speech Communication
An introduction to variational methods for graphical models
Learning in graphical models
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Shallow parsing with conditional random fields
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Supervised classification using MCMC methods
ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 01
Review: Speaker segmentation and clustering
Signal Processing
Variational Gaussian Mixture Models for Speech Emotion Recognition
ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Variational Bayesian Joint factor analysis for speaker verification
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
The I4U system in NIST 2008 speaker recognition evaluation
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
An overview of text-independent speaker recognition: From features to supervectors
Speech Communication
Variational Bayes Adapted GMM Based Models for Audio Clip Classification
PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Inferring parameters and structure of latent variable models by variational bayes
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Segmentation conditional random fields (SCRFs): a new approach for protein fold recognition
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
An overview of automatic speaker diarization systems
IEEE Transactions on Audio, Speech, and Language Processing
Variational learning for Gaussian mixture models
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Speaker Diarization: A Review of Recent Research
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
There are many references that concern a specific aspect of speaker tracking. This paper focuses on the speaker modeling issue and proposes conditional random fields (CRF) for this purpose. CRF is a class of undirected graphical models for classifying sequential data. CRF has some interesting characteristics which have encouraged us to use this model in a speaker modeling and tracking task. The main concern of CRF model is its training. Known approaches for CRF training are prone to overfitting and unreliable convergence. To solve this problem, variational approaches are proposed in this paper. The main novelty of this paper is to adapt variational framework for CRF training. The resulted approach is evaluated on three different areas. First, the best CRF model configuration for speaker modeling is evaluated on text independent speaker verification. Next, the selected model is used in a speaker detection task, in which the models of the existing speakers in the conversation are known a priori. Then, the proposed CRF approach is compared with GMM in an online speaker tracking framework. The results show that the proposed CRF model is superior to GMM in speaker detection and tracking, due to its capability for sequence modeling and segmentation.