Visual focus of attention in adaptive language acquisition

  • Authors:
  • Ananth Sankar;Allen Gorin

  • Affiliations:
  • AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ

  • Venue:
  • ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
  • Year:
  • 1993
  • Some experiments in spoken language acquisition

    ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I

Quantified Score

Hi-index 0.00

Visualization

Abstract

In our research on Adaptive Language Acquisition, we have been investigating connectionist systems that learn the mapping from a message to a meaningful machine action through interaction with a complex environment. Previously, the only input to these systems has been the message. However, in many devices of interest, the action also depends on the state of the world, thereby motivating the study of systems with multisensory input. In this work, we describe and evaluate a device which acquires language through interaction with an environment which provides both keyboard and visual input. In particular, the machine action is to focus its attention, by directing its eyeball toward one of many blocks of different colors and shapes, in response to a message such as "Look at the red square". The attention focus is controlled by minimizing a time-varying potential function that correlates the message and visual input. This correlation is factored through color and shape sensory primitive subnetworks in an information-theoretic connectionist network, allowing the machine to generalize between different objects having the same color or shape. The system runs in a conversational mode where the user can provide clarifying messages and error feedback until the system responds correctly. During the course of performing its task, a vocabulary of 431 words was acquired from 11 users in over 1000 unconstrained natural language conversations. The average number of inputs for the machine to respond correctly was only 1.4 sentences, and it retained 98% of what it was taught.