Multistream dynamic bayesian network for meeting segmentation

  • Authors:
  • Alfred Dielmann;Steve Renals

  • Affiliations:
  • Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

  • Venue:
  • MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the automatic analysis and segmentation of meetings. A meeting is analysed in terms of individual behaviours and group interactions, in order to decompose each meeting in a sequence of relevant phases, named meeting actions. Three feature families are extracted from multimodal recordings: prosody from individual lapel microphone signals, speaker activity from microphone array data and lexical features from textual transcripts. A statistical approach is then used to relate low-level features with a set of abstract categories. In order to provide a flexible and powerful framework, we have employed a dynamic Bayesian network based model, characterized by multiple stream processing and flexible state duration modelling. Experimental results demonstrate the strength of this system, providing a meeting action error rate of 9%.