Attending to visual motion

  • Authors:
  • John K. Tsotsos;Yueju Liu;Julio C. Martinez-Trujillo;Marc Pomplun;Evgueni Simine;Kunhao Zhou

  • Affiliations:
  • Department of Computer Science and Engineering, York University, Toronto, Canada and Centre for Vision Research, York University, Toronto, Ont., Canada;Department of Computer Science and Engineering, York University, Toronto, Canada and Centre for Vision Research, York University, Toronto, Ont., Canada;Department of Physiology, McGill University, Montreal, Canada;Department of Computer Science, University of Massachusetts, Boston, MA;Department of Computer Science and Engineering, York University, Toronto, Canada and Centre for Vision Research, York University, Toronto, Ont., Canada;Department of Computer Science and Engineering, York University, Toronto, Canada and Centre for Vision Research, York University, Toronto, Ont., Canada

  • Venue:
  • Computer Vision and Image Understanding - Special issue: Attention and performance in computer vision
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Visual motion analysis has focused on decomposing image sequences into their component features. There has been little success at re-combining those features into moving objects. Here a novel model of attentive visual motion processing is presented that addresses both decomposition of the signal into constituent features as well as the re-combination, or binding, of those features into wholes. A new feed-forward motion-processing pyramid is presented motivated by the neurobiology of primate motion processes. On this structure the Selective Tuning (ST) model for visual attention is demonstrated. There are three main contributions: (1) a new feed-forward motion processing hierarchy, the first to include a multi-level decomposition with local spatial derivatives of velocity: (2) examples of how ST operates on this hierarchy to attend to motion and to localize and label motion patterns: and (3) a new solution to the feature binding problem sufficient for grouping motion features into coherent object motion. Binding is accomplished using a top-down selection mechanism that does not depend on a single location-based saliency representation.