Using the forest to see the trees: exploiting context for visual object detection and localization

Authors:
A. Torralba;K. P. Murphy;W. T. Freeman
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;University of British Columbia, Vancouver, Canada;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Communications of the ACM
Year:
2010

Citing 14
Cited 11

Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery

IEEE Transactions on Pattern Analysis and Machine Intelligence - Special issue on interpretation of 3-D scenes—part I
Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Pyramid-based texture analysis/synthesis

SIGGRAPH '95 Proceedings of the 22nd annual conference on Computer graphics and interactive techniques
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Contextual Priming for Object Detection

International Journal of Computer Vision
Discriminative Random Fields: A Discriminative Framework for Contextual Interaction in Classification

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Robust Real-Time Face Detection

International Journal of Computer Vision
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Geometric Context from a Single Image

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Sharing Visual Features for Multiclass and Multiview Object Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
Multiscale conditional random fields for image labeling

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Context modeling in computer vision: techniques, implications, and applications

Multimedia Tools and Applications
Understanding web images by object relation network

Proceedings of the 21st international conference on World Wide Web
Context modeling for facial landmark detection based on Non-Adjacent Rectangle (NAR) Haar-like feature

Image and Vision Computing
Learning to place new objects in a scene

International Journal of Robotics Research
Biologically inspired task oriented gist model for scene classification

Computer Vision and Image Understanding
Location and route tracking in university from photos without GPS information

PCM'12 Proceedings of the 13th Pacific-Rim conference on Advances in Multimedia Information Processing
Local context priors for object proposal generation

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
A feature-word-topic model for image annotation and retrieval

ACM Transactions on the Web (TWEB)
Multi-modal image annotation with multi-instance multi-label LDA

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
A bag-of-semantics model for image clustering

The Visual Computer: International Journal of Computer Graphics
Branch&Rank for Efficient Object Detection

International Journal of Computer Vision

Quantified Score

Hi-index	48.22

Visualization

Abstract

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places. Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.