Multistream dynamic bayesian network for meeting segmentation

Authors:
Alfred Dielmann;Steve Renals
Affiliations:
Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Venue:
MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Year:
2004

Citing 9
Cited 1

Probabilistic independence networks for hidden Markov probability models

Neural Computation
The Hierarchical Hidden Markov Model: Analysis and Applications

Machine Learning
Introduction to Bayesian Networks

Introduction to Bayesian Networks
Probabilistic Networks and Expert Systems

Probabilistic Networks and Expert Systems
Four Paradigms for Indexing Video Conferences

IEEE MultiMedia
Speech recognition with dynamic bayesian networks

Speech recognition with dynamic bayesian networks
Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised discovery of multilevel statistical video structures using hierarchical hidden Markov models

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 3 (ICME '03) - Volume 03

Multimodal integration for meeting group action segmentation and recognition

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the automatic analysis and segmentation of meetings. A meeting is analysed in terms of individual behaviours and group interactions, in order to decompose each meeting in a sequence of relevant phases, named meeting actions. Three feature families are extracted from multimodal recordings: prosody from individual lapel microphone signals, speaker activity from microphone array data and lexical features from textual transcripts. A statistical approach is then used to relate low-level features with a set of abstract categories. In order to provide a flexible and powerful framework, we have employed a dynamic Bayesian network based model, characterized by multiple stream processing and flexible state duration modelling. Experimental results demonstrate the strength of this system, providing a meeting action error rate of 9%.