Architecture and implementation of multimodal plug and play

Authors:
Christian Elting;Stefan Rapp;Gregor Möhler;Michael Strube
Affiliations:
European Media Laboratory GmbH, Heidelberg, Germany;Sony Corporate Laboratories Europe, Stuttgart, Germany;Sony Corporate Laboratories Europe, Stuttgart, Germany;European Media Laboratory GmbH, Heidelberg, Germany
Venue:
Proceedings of the 5th international conference on Multimodal interfaces
Year:
2003

Citing 12
Cited 11

WIP: the automatic synthesis of multimodal presentations

Intelligent multimedia interfaces
KQML as an agent communication language

CIKM '94 Proceedings of the third international conference on Information and knowledge management
Top-down hierarchical planning of coherent visual discourse

Proceedings of the 2nd international conference on Intelligent user interfaces
Generating coordinated natural language and 3D animations for complex spatial explanations

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
JAM: a BDI-theoretic mobile agent architecture

Proceedings of the third annual conference on Autonomous Agents
The IntelliMedia WorkBench - An Environment for Building Multimodal Systems

CMC '98 Revised Papers from the Second International Conference on Cooperative Multimodal Communication
Modeling Output in the EMBASSI Multimodal Dialog System

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Natural language in four spatial interfaces

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
The influence of minimum edit distance on reference resolution

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
A scalable avatar for conversational user interfaces

ERCIM'02 Proceedings of the User interfaces for all 7th international conference on Universal access: theoretical perspectives, practice, and experience

ICARE software components for rapidly developing multimodal interfaces

Proceedings of the 6th international conference on Multimodal interfaces
A conceptual framework for developing adaptive multimodal applications

Proceedings of the 11th international conference on Intelligent user interfaces
Multimodal human-computer interaction: A survey

Computer Vision and Image Understanding
Natural multimodal dialogue systems: a configurable dialogue and presentation strategies component

Proceedings of the 9th international conference on Multimodal interfaces
Fast track article: Designing an extensible architecture for Personalized Ambient Information

Pervasive and Mobile Computing
Multimodal interfaces: Challenges and perspectives

Journal of Ambient Intelligence and Smart Environments
VR-CAD integration: Multimodal immersive interaction and advanced haptic paradigms for implicit edition of CAD models

Computer-Aided Design
Multimodal architectures: issues and experiences

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I
Multimodal interfaces: Challenges and perspectives

Journal of Ambient Intelligence and Smart Environments
A distributed staged architecture for multimodal applications

ECSA'07 Proceedings of the First European conference on Software Architecture
A Dynamic Spoken Dialogue Interface for Ambient Intelligence Interaction

International Journal of Ambient Computing and Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the handling of multimodality in the Embassi system. Here, multimodality is treated in two modules. Firstly, a modality fusion component merges speech, video traced pointing gestures, and input from a graphical user interface. Secondly, a presentation planning component decides upon the modality to be used for the output, i.e., speech, an animated life-like character (ALC) and/or the graphical user interface, and ensures that the presentation is coherent and cohesive. We describe how these two components work and emphasize one particular feature of our system architecture: All modality analysis components generate output in a common semantic description format and all render components process input in a common output language. This makes it particularly easy to add or remove modality analyzers or renderer components, even dynamically while the system is running. This plug and play of modalities can be used to adjust the system's capabilities to different demands of users and their situative context. In this paper we give details about the implementations of the models, protocols and modules that are necessary to realize those features.