Grounded situation models for situated conversational assistants

Authors:
Rosalind W. Picard;Nikolaos Mavridis
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Grounded situation models for situated conversational assistants
Year:
2007

Citing 0
Cited 1

FaceBots: robots utilizing and publishing social information in facebook

Proceedings of the 4th ACM/IEEE international conference on Human robot interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

A Situated Conversational Assistant (SCA) is a system with sensing, acting and speech synthesis/recognition abilities, which engages in physically situated natural language conversation with human partners and assists them in carrying out tasks. This thesis addresses some prerequisites towards an ideal truly cooperative SCA through the development of a computational model of embodied, situated language agents and implementation of the model in the form of an interactive, conversational robot. The proposed model produces systems that are capable of a core set of situated natural language communication skills, and provides leverage for many extensions towards the ideal SCA, such as mind reading skills. The central idea is to endow agents with a sensor-updated "structured blackboard" representational structure called a Grounded Situation Model (GSM), which is closely related to the cognitive psychology notion of situation models. The GSM serves as a workspace with contents similar to a "theatrical stage" in the agent's "mind". The GSM may be filled either with the contents of the agent's present here-and-now physical situation, or a past situation that is being recalled, or an imaginary situation that is being described or planned. Furthermore, the GSM contains descriptions of both physical (such as objects) as well as mental aspects of situations (such as beliefs of others). Most importantly, the proposed GSM design enables bidirectional translation between linguistic descriptions and perceptual data/expectations. To demonstrate viability, an instance of the model was implemented on a manipulator robot with touch, vision, and speech synthesis/ recognition. The robot grasps the semantics of a range of words and speech acts related to cooperative manipulation of objects on a table top situated between the robot and human. The robot's language comprehension abilities are comparable to those implied by a standard and widely used test of children's language comprehension (the Token Test), and in some directions also surpass those abilities. Not only the viability but also the effectiveness of the GSM proposal is thus demonstrated, through a real-world autonomous robot that performs comparably to those capabilities of a normally-developing three-year old child which are assessed by the token test. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)