Multimodal integration for meeting group action segmentation and recognition

  • Authors:
  • Marc Al-Hames;Alfred Dielmann;Daniel Gatica-Perez;Stephan Reiter;Steve Renals;Gerhard Rigoll;Dong Zhang

  • Affiliations:
  • Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL), Martigny, Switzerland;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK;Institute for Human-Machine-Communication, Technische Universität München, Munich, Germany;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL), Martigny, Switzerland

  • Venue:
  • MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of segmentation and recognition of sequences of multimodal human interactions in meetings. These interactions can be seen as a rough structure of a meeting, and can be used either as input for a meeting browser or as a first step towards a higher semantic analysis of the meeting. A common lexicon of multimodal group meeting actions, a shared meeting data set, and a common evaluation procedure enable us to compare the different approaches. We compare three different multimodal feature sets and our modelling infrastructures: a higher semantic feature approach, multi-layer HMMs, a multi-stream DBN, as well as a multi-stream mixed-state DBN for disturbed data.