Streetscenes: towards scene understanding in still images

Authors:
Tomaso A. Poggio;Stanley Michael Bileschi
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Streetscenes: towards scene understanding in still images
Year:
2006

Citing 0
Cited 12

Robust Object Recognition with Cortex-Like Mechanisms

IEEE Transactions on Pattern Analysis and Machine Intelligence
Context Driven Focus of Attention for Object Detection

Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Integrating Visual Context and Object Detection within a Probabilistic Framework

Attention in Cognitive Systems
A framework for visual-context-aware object detection in still images

Computer Vision and Image Understanding
Learning contextual rules for priming object categories in images

ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
Non-local characterization of scenery images: statistics, 3D reasoning, and a generative model

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Supervised label transfer for semantic segmentation of street scenes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Fast object detection using steiner tree

Machine Graphics & Vision International Journal
Image retrieval with semantic sketches

INTERACT'11 Proceedings of the 13th IFIP TC 13 international conference on Human-computer interaction - Volume Part I
SOUSA: sketch-based online user study applet

SBM'08 Proceedings of the Fifth Eurographics conference on Sketch-Based Interfaces and Modeling
Beyond the line of sight: labeling the underlying surfaces

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
A feature construction method for general object recognition

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This thesis describes an effort to construct a scene understanding system that is able to analyze the content of real images. While constructing the system we had to provide solutions to many of the fundamental questions that every student of object recognition deals with daily. These include the choice of data set, the choice of success measurement, the representation of the image content, the selection of inference engine, and the representation of the relations between objects. The main test-bed for our system is the CBCL StreetScenes data base. It is a carefully labeled set of images, much larger than any similar data set available at the time it was collected. Each image in this data set was labeled for 9 common classes such as cars, pedestrians, roads and trees. Our system represents each image using a set of features that are based on a model of the human visual system constructed in our lab. We demonstrate that this biologically motivated image representation, along with its extensions, constitutes an effective representation for object detection, facilitating unprecedented levels of detection accuracy. Similarly to biological vision systems, our system uses hierarchical representations. We therefore explore the possible ways of combining information across the hierarchy into the final perception. Our system is trained using standard machine learning machinery, which was first applied to computer vision in earlier work of Prof. Poggio and others. We demonstrate how the same standard methods can be used to model relations between objects in images as well, capturing context information. The resulting system detects and localizes, using a unified set of tools and image representations, compact objects such as cars, amorphous objects such as trees and roads, and the relations between objects within the scene. The same representation also excels in identifying objects in clutter without scanning the image. Much of the work presented in the thesis was devoted to a rigorous comparison of our system to alternative object recognition systems. The results of these experiments support the effectiveness of simple feed-forward systems for the basic tasks involved in scene understanding. We make our results fully available to the public by publishing our code and data sets in hope that others may improve and extend our results. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)