On-line multi-modal speaker diarization

Authors:
Athanasios Noulas;Ben J. A. Krose
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
Proceedings of the 9th international conference on Multimodal interfaces
Year:
2007

Citing 10
Cited 5

Evaluation of adaptive mixtures of competing experts

NIPS-3 Proceedings of the 1990 conference on Advances in neural information processing systems 3
Probabalistic Models and Informative Subspaces for Audiovisual Correspondence

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part III
A Graphical Model for Audiovisual Object Tracking

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hierarchical Mixtures of Experts and the EM Algorithm

Hierarchical Mixtures of Experts and the EM Algorithm
Multimodal processing by finding common cause

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Escaping local minima through hierarchical model selection: Automatic object discovery, segmentation, and tracking in video

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
EM detection of common origin of multi-modal cues

Proceedings of the 8th international conference on Multimodal interfaces
Automatic cluster complexity and quantity selection: towards robust speaker diarization

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
A tutorial on particle filters for online nonlinear/non-GaussianBayesian tracking

IEEE Transactions on Signal Processing

Visual speaker localization aided by acoustic models

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Investigating the use of visual focus of attention for audio-visual speaker diarisation

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Dialocalization: Acoustic speaker diarization and visual localization as joint optimization problem

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features

Speech Communication
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel framework that utilizes multi-modal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress from a simple observation model to a complex multi-modal one as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming from webcameras, human computer interaction and video conferences.