Multimodal response generation in GIS

Authors:
Levent Bolelli
Affiliations:
Pennsylvania State University, University Park, PA
Venue:
Proceedings of the 6th international conference on Multimodal interfaces
Year:
2004

Citing 5
Cited 0

Attention, intentions, and the structure of discourse

Computational Linguistics
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Collaborative plans for complex group action

Artificial Intelligence
Designing a human-centered, multimodal GIS interface to support emergency management

Proceedings of the 10th ACM international symposium on Advances in geographic information systems
A collaborative planning model of intentional structure

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in computer hardware and software technologies have enabled sophisticated information visualization techniques as well as new interaction opportunities to be introduced in the development of GIS (Geographical Information Systems) applications. Especially, research efforts in computer vision and natural language processing have enabled users to interact with computer applications using natural speech and gestures, which has proven to be effective for interacting with dynamic maps [1, 6]. Pen-based mobile devices and gesture recognition systems enable system designers to define application-specific gestures for carrying out particular tasks. Using force-feedback mouse for interacting with GIS has been proposed for visually-impaired people [4]. These are exciting new opportunities and hold the promise of advancing interaction with computers to a complete new level. The ultimate aim, however, should be directed on facilitating human-computer communication; that is, equal emphasis should be given to both understanding and generation of multimodal behavior. My proposed research will provide a conceptual framework and a computational model for generating multimodal responses to communicate spatial information along with dynamically generated maps. The model will eventually lead to development of a computational agent that has reasoning capabilities for distributing the semantic and pragmatic content of the intended response message among speech, deictic gestures and visual information. In other words, the system will be able to select the most natural and effective mode(s) of communicating back to the user. Any research in computer science that investigates direct interaction of computers with humans should place human factors in center stage. Therefore, this work will follow a multi-disciplinary approach and integrate ideas from previous research in Psychology, Cognitive Science, Linguistics, Cartography, Geographical Information Science (GIScience) and Computer Science that will enable us to identify and address human, cartographic and computational issues involved in response planning and assist users with their spatial decision making by facilitating their visual thinking process as well as reducing their cognitive load. The methodology will be integrated into the design of DAVE_G [7] prototype: a,6e of Computer Science andUSAtyd Engineeringerface to Support Emergency Management. meaning. natural, multimodal, mixed - initiative dialogue interface to GIS. The system is currently capable of recognizing, interpreting and fusing users' natural occurring speech and gesture requests, and generating natural speech output. The communication between the system and user is modeled following the collaborative discourse theory [2] and maintains a Recipe Graph [5] structure - based on SharedPlan theory[3] - to represent the intentional structure of the discourse between the user and system. One major concern in generating speech responses for dynamic maps is that spatial information cannot be effectively communicated using speech. Altering perceptual attributes (e.g. color, size, pattern) of the visual data to direct user's attention to a particular location on the map is not usually effective, since each attribute bears an inherent semantic meaning and those perceptual attributes should be modified only when the system's judgement states that those attributes are not crucial to the user's understanding of the situation at that stage of the task. Gesticulation, on the other hand, is powerful for conveying location and form of spatially oriented information [6] without manipulating the map and the benefit of facilitating speech production. My research aims at designing feasible, extensible and effective multimodal response generation (content planning and modality allocation) model. A plan-based reasoning algorithm and methodology integrated with the Recipe Graph structure has the potential to achieve those goals.