Visual recognition: computational models and human psychophysics

  • Authors:
  • Pietro Petrona;Fei-Fei Li

  • Affiliations:
  • California Institute of Technology;California Institute of Technology

  • Venue:
  • Visual recognition: computational models and human psychophysics
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Object and scene recognition is one of the most essential functionalities of human vision. It is also of fundamental importance for machines to be able to learn and recognize meaningful objects and scenes. In this thesis, we explore the following four aspects of object and scene recognition. It is well known that humans can be "blind" even to major aspects of natural scenes when we attend elsewhere. The only tasks that do not need attention appear to be carried out in the early stages of the visual system. Contrary to this common belief, we show that subjects can rapidly detect animals or vehicles in briefly presented novel natural scenes while simultaneously performing another attentionally demanding task. By comparison, they are unable to discriminate large T's from L's, or bisected two-color disks from their mirror images under the same conditions. We explore this phenonmenon further by removing color from the natural scenes, or increasing the number of images peripherally. We find evidence that suggests that familiarity and meaningfulness might be among the factors that determine attentional requirements for both natural and synthetic stimuli. So what exactly do we see when we glance at a natural scene? And does what we see change as the glance becomes longer? We asked naive subjects to report what they saw in nearly a hundred briefly presented photographs. After each presentation subjects reported what they had just seen as completely as possible. Afterward, another group of sophisticated individuals who were not aware of the goals of the experiment were instructed to score each of the descriptions produced by the subjects in the first stage. Individual scores were assigned to more than a hundred different attributes. Given the evaluation of the responses, we show that within a single glance, much object and scene level information is perceived by human subjects. But the richness of our perception seems asymmetrical. Subjects tend to have a bias to natural scenes being perceived as outdoor rather than indoor. In computer vision, it is commonly known that learning visual models of object categories notoriously requires thousands of training examples. We show that it is possible to learn much information about a category from just one image, or a handful of images. (Abstract shortened by UMI.)