Evaluation of adaptive mixtures of competing experts
NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Probabalistic Models and Informative Subspaces for Audiovisual Correspondence
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical Mixtures of Experts and the EM Algorithm
Hierarchical Mixtures of Experts and the EM Algorithm
Multimodal processing by finding common cause
Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
EM detection of common origin of multi-modal cues
Proceedings of the 8th international conference on Multimodal interfaces
Automatic cluster complexity and quantity selection: towards robust speaker diarization
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking
IEEE Transactions on Signal Processing
Visual speaker localization aided by acoustic models
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Investigating the use of visual focus of attention for audio-visual speaker diarisation
MM '09 Proceedings of the 17th ACM international conference on Multimedia
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
Speech Communication
A review on speaker diarization systems and approaches
Speech Communication
Hi-index | 0.00 |
This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming from webcameras, human computer interaction and video conferences.