Basic level scene understanding: from labels to structure and beyond

  • Authors:
  • Jianxiong Xiao;Bryan C. Russell;James Hays;Krista A. Ehinger;Aude Oliva;Antonio Torralba

  • Affiliations:
  • Massachusetts Institute of Technology;University of Washington;Brown University;Massachusetts Institute of Technology;Massachusetts Institute of Technology;Massachusetts Institute of Technology

  • Venue:
  • SIGGRAPH Asia 2012 Technical Briefs
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

An early goal of computer vision was to build a system that could automatically understand a 3D scene just by looking. This requires not only the ability to extract 3D information from image information alone, but also to handle the large variety of different environments that comprise our visual world. This paper summarizes our recent efforts toward these goals. First, we describe the SUN database, which is a collection of annotated images spanning 908 different scene categories. This database allows us to systematically study the space of possible everyday scenes and to establish a benchmark for scene and object recognition. We also explore ways of coping with the variety of viewpoints within these scenes. For this, we have introduced a database of 360° panoramic images for many of the scene categories in the SUN database and have explored viewpoint recognition within the environments. Finally, we describe steps toward a unified 3D parsing of everyday scenes: (i) the ability to localize geometric primitives in images, such as cuboids and cylinders, which often comprise many everyday objects, and (ii) an integrated system to extract the 3D structure of the scene and objects depicted in an image.