Audio-visual localization with hierarchical topographic maps: Modeling the superior colliculus

  • Authors:
  • M. C. Casey;A. Pavlou;A. Timotheou

  • Affiliations:
  • Department of Computing, University of Surrey, Guildford, Surrey GU2 7XH, UK;Department of Computing, University of Surrey, Guildford, Surrey GU2 7XH, UK;Department of Computing, University of Surrey, Guildford, Surrey GU2 7XH, UK

  • Venue:
  • Neurocomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

A key attribute of the brain is its ability to seamlessly integrate sensory information to form a multisensory representation of the world. In early perceptual processing, the superior colliculus (SC) takes a leading role in integrating visual, auditory and somatosensory stimuli in order to direct eye movements. The SC forms a representation of multisensory space through a layering of retinotopic maps which are sensitive to different types of stimuli. These eye-centered topographic maps can adapt to crossmodal stimuli so that the SC can automatically shift our gaze, moderated by cortical feedback. In this paper we describe a neural network model of the SC consisting of a hierarchy of nine topographic maps that combine to form a multisensory retinotopic representation of audio-visual space. Our motivation is to evaluate whether a biologically plausible model of the SC can localize audio-visual inputs live from a camera and two microphones. We use spatial contrast and a novel form of temporal contrast for visual sensitivity, and interaural level difference for auditory sensitivity. Results are comparable with the performance observed in cats where coincident stimuli are accurately localized, while presentation of disparate stimuli causes a significant drop in performance. The benefit of crossmodal localization is shown by adding increasing amounts of noise to the visual stimuli to the point where audio-visual localization significantly out performs visual-only localization. This work demonstrates how a novel, biologically motivated model of low level multisensory processing can be applied to practical, real-world input in real-time, while maintaining its comparability with biology.