A framework for evaluating multimodal integration by humans and a role for embodied conversational agents

Authors:
Dominic W. Massaro
Affiliations:
University of California, Santa Cruz, Santa Cruz, CA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 9
Cited 1

Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables

Speech Communication - Special issue: Fujisaki's Festschrift
Formal Methods in Human-Computer Interaction

Formal Methods in Human-Computer Interaction
Models of attention in computing and communication: from principles to applications

Communications of the ACM
Evaluating Integrated Speech- and Image Understanding

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A Map-Based System Using Speech and 3D Gestures for Pervasive Computing

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Context-Based Multimodal Input Understanding in Conversational Systems

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Training a Talking Head

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Toward a theory of organized multimodal integration patterns during human-computer interaction

Proceedings of the 5th international conference on Multimodal interfaces
A computer-animated tutor for spoken and written language learning

Proceedings of the 5th international conference on Multimodal interfaces

Goal orientated conversational agents: applications to benefit society

KES-AMSTA'11 Proceedings of the 5th KES international conference on Agent and multi-agent systems: technologies and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the implicit assumptions of multi-modal interfaces is that human-computer interaction is significantly facilitated by providing multiple input and output modalities. Surprisingly, however, there is very little theoretical and empirical research testing this assumption in terms of the presentation of multimodal displays to the user. The goal of this paper is provide both a theoretical and empirical framework for addressing this important issue. Two contrasting models of human information processing are formulated and contrasted in experimental tests. According to integration models, multiple sensory influences are continuously combined during categorization, leading to perceptual experience and action. The Fuzzy Logical Model of Perception (FLMP) assumes that processing occurs in three successive but overlapping stages: evaluation, integration, and decision (Massaro, 1998). According to nonintegration models, any perceptual experience and action results from only a single sensory influence. These models are tested in expanded factorial designs in which two input modalities are varied independently of one another in a factorial design and each modality is also presented alone. Results from a variety of experiments on speech, emotion, and gesture support the predictions of the FLMP. Baldi, an embodied conversational agent, is described and implications for applications of multimodal interfaces are discussed.