Architecture and implementation of multimodal plug and play

  • Authors:
  • Christian Elting;Stefan Rapp;Gregor Möhler;Michael Strube

  • Affiliations:
  • European Media Laboratory GmbH, Heidelberg, Germany;Sony Corporate Laboratories Europe, Stuttgart, Germany;Sony Corporate Laboratories Europe, Stuttgart, Germany;European Media Laboratory GmbH, Heidelberg, Germany

  • Venue:
  • Proceedings of the 5th international conference on Multimodal interfaces
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the handling of multimodality in the Embassi system. Here, multimodality is treated in two modules. Firstly, a modality fusion component merges speech, video traced pointing gestures, and input from a graphical user interface. Secondly, a presentation planning component decides upon the modality to be used for the output, i.e., speech, an animated life-like character (ALC) and/or the graphical user interface, and ensures that the presentation is coherent and cohesive. We describe how these two components work and emphasize one particular feature of our system architecture: All modality analysis components generate output in a common semantic description format and all render components process input in a common output language. This makes it particularly easy to add or remove modality analyzers or renderer components, even dynamically while the system is running. This plug and play of modalities can be used to adjust the system's capabilities to different demands of users and their situative context. In this paper we give details about the implementations of the models, protocols and modules that are necessary to realize those features.