Audio-Visual processing in meetings: seven questions and current AMI answers

Authors:
Marc Al-Hames;Thomas Hain;Jan Cernocky;Sascha Schreiber;Mannes Poel;Ronald Müller;Sebastien Marcel;David van Leeuwen;Jean-Marc Odobez;Sileye Ba;Herve Bourlard;Fabien Cardinaux;Daniel Gatica-Perez;Adam Janin;Petr Motlicek;Stephan Reiter;Steve Renals;Jeroen van Rest;Rutger Rienks;Gerhard Rigoll;Kevin Smith;Andrew Thean;Pavel Zemcik
Affiliations:
Institute for Human-Machine-Communication, Technische Universität München;Department of Computer Science, University of Sheffield;Faculty of Information Technology, Brno University of Technology;Institute for Human-Machine-Communication, Technische Universität München;Department of Computer Science, University of Twente;Institute for Human-Machine-Communication, Technische Universität München;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);Netherlands Organisation for Applied Scientific Research (TNO);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);International Computer Science Institute, Berkeley, CA;Faculty of Information Technology, Brno University of Technology;Institute for Human-Machine-Communication, Technische Universität München;Centre for Speech Technology Research, University of Edinburgh;Netherlands Organisation for Applied Scientific Research (TNO);Department of Computer Science, University of Twente;Institute for Human-Machine-Communication, Technische Universität München;IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne (EPFL);Netherlands Organisation for Applied Scientific Research (TNO);Faculty of Information Technology, Brno University of Technology
Venue:
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Year:
2006

Citing 9
Cited 2

Automatic Analysis of Facial Expressions: The State of the Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust Real-Time Face Detection

International Journal of Computer Vision
Participant Activity Detection by Hands and Face Movement Tracking in the Meeting Room

CGI '04 Proceedings of the Computer Graphics International
Face Authentication Test on the BANCA Database

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 4 - Volume 04
Evaluating Multi-Object Tracking

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops - Volume 03
Comparison of MLP and GMM classifiers for face verification on XM2VTS

AVBPA'03 Proceedings of the 4th international conference on Audio- and video-based biometric person authentication
Face verification using adapted generative models

FGR' 04 Proceedings of the Sixth IEEE international conference on Automatic face and gesture recognition
The 2005 AMI system for the transcription of speech in meetings

MLMI'05 Proceedings of the Second international conference on Machine Learning for Multimodal Interaction
Browsing recorded meetings with ferret

MLMI'04 Proceedings of the First international conference on Machine Learning for Multimodal Interaction

Google home: Experience, support and re-experience of social home activities

Information Sciences: an International Journal
Using audio, visual, and lexical features in a multi-modal virtual meeting director

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The project Augmented Multi-party Interaction (AMI) is concerned with the development of meeting browsers and remote meeting assistants for instrumented meeting rooms – and the required component technologies R&D themes: group dynamics, audio, visual, and multimodal processing, content abstraction, and human-computer interaction. The audio-visual processing workpackage within AMI addresses the automatic recognition from audio, video, and combined audio-video streams, that have been recorded during meetings. In this article we describe the progress that has been made in the first two years of the project. We show how the large problem of audio-visual processing in meetings can be split into seven questions, like “Who is acting during the meeting?”. We then show which algorithms and methods have been developed and evaluated for the automatic answering of these questions.