Multimodal integration for meeting group action segmentation and recognition

Authors:
Marc Al-Hames;Alfred Dielmann;Daniel Gatica-Perez;Stephan Reiter;Steve Renals;Gerhard Rigoll;Dong Zhang
Affiliations:
Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL), Martigny, Switzerland;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL), Martigny, Switzerland
Venue:
MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Year:
2005

Citing 5
Cited 5

Segmentation and Classification of Meeting Events using Multiple Classifier Fusion and Dynamic Programming

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Modeling Individual and Group Actions in Meetings: A Two-Layer HMM Framework

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 7 - Volume 07
Automatic Analysis of Multimodal Group Actions in Meetings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multistream dynamic bayesian network for meeting segmentation

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Extracting information from multimedia meeting collections

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Automatic nonverbal analysis of social interaction in small groups: A review

Image and Vision Computing
Boosting multi-modal camera selection with semantic features

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Graphical models for multi-modal automatic video editing in meetings

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Using audio, visual, and lexical features in a multi-modal virtual meeting director

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and our modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multi-stream DBN, as well as a multi-stream mixed-state DBN for disturbed data.