Using audio, visual, and lexical features in a multi-modal virtual meeting director

  • Authors:
  • Marc Al-Hames;Benedikt Hörnler;Christoph Scheuermann;Gerhard Rigoll

  • Affiliations:
  • Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany

  • Venue:
  • MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multi-modal recordings of meetings provide the basis for meeting browsing and for remote meetings. However it is often not useful to store or transmit all visual channels. In this work we show how a virtual meeting director selects one of seven possible video modes. We then present several audio, visual, and lexical features for a virtual director. In an experimental section we evaluate the features, their influence on the camera selection, and the properties of the generated video stream. The chosen features all allow a real- or near real-time processing and can therefore not only be applied to offline browsing, but also for a remote meeting assistant.