Audio-Visual processing in meetings: seven questions and current AMI answers

  • Authors:
  • Marc Al-Hames;Thomas Hain;Jan Cernocky;Sascha Schreiber;Mannes Poel;Ronald Müller;Sebastien Marcel;David van Leeuwen;Jean-Marc Odobez;Sileye Ba;Herve Bourlard;Fabien Cardinaux;Daniel Gatica-Perez;Adam Janin;Petr Motlicek;Stephan Reiter;Steve Renals;Jeroen van Rest;Rutger Rienks;Gerhard Rigoll;Kevin Smith;Andrew Thean;Pavel Zemcik

  • Affiliations:
  • Institute for Human-Machine-Communication, Technische Universität München;Department of Computer Science, University of Sheffield;Faculty of Information Technology, Brno University of Technology;Institute for Human-Machine-Communication, Technische Universität München;Department of Computer Science, University of Twente;Institute for Human-Machine-Communication, Technische Universität München;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);Netherlands Organisation for Applied Scientific Research (TNO);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);International Computer Science Institute, Berkeley, CA;Faculty of Information Technology, Brno University of Technology;Institute for Human-Machine-Communication, Technische Universität München;Centre for Speech Technology Research, University of Edinburgh;Netherlands Organisation for Applied Scientific Research (TNO);Department of Computer Science, University of Twente;Institute for Human-Machine-Communication, Technische Universität München;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);Netherlands Organisation for Applied Scientific Research (TNO);Faculty of Information Technology, Brno University of Technology

  • Venue:
  • MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like “Who is acting during the meeting?”. We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.