Using audio, visual, and lexical features in a multi-modal virtual meeting director

Authors:
Marc Al-Hames;Benedikt Hörnler;Christoph Scheuermann;Gerhard Rigoll
Affiliations:
Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 7
Cited 3

Detecting Faces in Images: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Digital Image Processing: PIKS Inside

Digital Image Processing: PIKS Inside
Participant Activity Detection by Hands and Face Movement Tracking in the Meeting Room

CGI '04 Proceedings of the Computer Graphics International
Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
Multimodal integration for meeting group action segmentation and recognition

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Audio-Visual processing in meetings: seven questions and current AMI answers

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Boosting multi-modal camera selection with semantic features

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Graphical models for multi-modal automatic video editing in meetings

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
MoViMash: online mobile video mashup

Proceedings of the 20th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-modal recordings of meetings provide the basis for meeting browsing and for remote meetings. However it is often not useful to store or transmit all visual channels. In this work we show how a virtual meeting director selects one of seven possible video modes. We then present several audio, visual, and lexical features for a virtual director. In an experimental section we evaluate the features, their influence on the camera selection, and the properties of the generated video stream. The chosen features all allow a real- or near real-time processing and can therefore not only be applied to offline browsing, but also for a remote meeting assistant.