Graphical models for multi-modal automatic video editing in meetings

Authors:
Benedikt Hörnler;Dejan Arsic;Björn Schuller;Gerhard Rigoll
Affiliations:
Technische Universität München, Institute for Human-Machine-Communication, Munich, Germany;Technische Universität München, Institute for Human-Machine-Communication, Munich, Germany;Technische Universität München, Institute for Human-Machine-Communication, Munich, Germany;Technische Universität München, Institute for Human-Machine-Communication, Munich, Germany
Venue:
DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Year:
2009

Citing 11
Cited 0

A tutorial on learning with Bayesian networks

Learning in graphical models
Detecting Faces in Images: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparison of different implementations of MFCC

Journal of Computer Science and Technology
Learning Dynamic Bayesian Networks

Adaptive Processing of Sequences and Data Structures, International Summer School on Neural Networks, "E.R. Caianiello"-Tutorial Lectures
Dynamic bayesian networks: representation, inference and learning

Dynamic bayesian networks: representation, inference and learning
Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
A meeting browser evaluation test

CHI '05 Extended Abstracts on Human Factors in Computing Systems
The AMI meeting corpus: a pre-announcement

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Multimodal integration for meeting group action segmentation and recognition

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Using audio, visual, and lexical features in a multi-modal virtual meeting director

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we present a multi-modal video editing system for meetings, which uses graphical models for the segmentation and classification of the video modes. The task of video editing is about selecting the camera, that represents the meeting in the best way out of various available cameras. Therefore a new training structure for graphical models was developed. This is necessary for the learning of boundaries combined with the video mode itself. All developed and known decoding structures can be easily connected for an EM-training to our training structure. The achieved results of the system are state of the art.