Temporal binding of multimodal controls for dynamic map displays: a systems approach

  • Authors:
  • Ellen C. Haas;Krishna S. Pillalamarri;Chris C. Stachowiak;Gardner McCullough

  • Affiliations:
  • Human Research and Engineering Directorate, Aberdeen Proving Ground, MD, USA;Human Research and Engineering Directorate, Aberdeen Proving Ground, MD, USA;Human Research and Engineering Directorate, Aberdeen Proving Ground, MD, USA;University of Maryland Baltimore County, Baltimore, MD, USA

  • Venue:
  • ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic map displays are visual interfaces that show the spatial positions of objects of interest (e.g., people, robots, vehicles), and can be updated with user commands as well as world changes, often in real time. Multimodal (speech and touch) controls were designed for a U.S. Army Research Laboratory dynamic map display to allow users to provide supervisory control of a simulated robotic swarm. This study characterized the effects of user performance (input difficulty, modality preference, and response to different levels of workload) on multimodal intercommand time (i.e., temporal binding), and explored how this might relate to the system's ability to bind or fuse user multimodal inputs into a unitary response. User performance was tested in a laboratory study using 6 male and 6 female volunteers with a mean age of 26 years. Results showed that 64% of all participants used speech commands first 100% of the time, while the remaining used touch commands first 100% of the time. Temporal binding between touch and voice commands was significantly shorter for touch-first than for speech-first commands, no matter what the level of workload. For both speech and touch commands, temporal binding was significantly shorter for both roads and swarm edges than for intersections. Results indicated that all of these factors can be significant in relating to a system's ability to bind multimodal inputs into a unitary response. Suggestions for future research are described.