Blocks world revisited: image understanding using qualitative geometry and mechanics

Authors:
Abhinav Gupta;Alexei A. Efros;Martial Hebert
Affiliations:
Robotics Institute, Carnegie Mellon University;Robotics Institute, Carnegie Mellon University;Robotics Institute, Carnegie Mellon University
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Year:
2010

Citing 9
Cited 15

Visual Event Classification via Force Dynamics

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Recovering Surface Layout from an Image

International Journal of Computer Vision
A stochastic grammar of images

Foundations and Trends® in Computer Graphics and Vision
Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Make3D: Learning 3D Scene Structure from a Single Still Image

IEEE Transactions on Pattern Analysis and Machine Intelligence
The ACRONYM model-based vision system

IJCAI'79 Proceedings of the 6th international joint conference on Artificial intelligence - Volume 1
Stages as Models of Scene Geometry

IEEE Transactions on Pattern Analysis and Machine Intelligence
Thinking inside the box: using appearance models and context based on room geometry

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI

Thinking inside the box: using appearance models and context based on room geometry

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Context modeling in computer vision: techniques, implications, and applications

Multimedia Tools and Applications
Characterizing structural relationships in scenes using graph kernels

ACM SIGGRAPH 2011 papers
Toward coherent object detection and scene layout understanding

Image and Vision Computing
Interactive images: cuboid proxies for smart image manipulation

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Topological spatial relations for active visual search

Robotics and Autonomous Systems
Acquiring 3D indoor environments with variability and repetition

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Extracting 3d scene-consistent object proposals and depth from stereo images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
People watching: human actions as a cue for single view geometry

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Indoor segmentation and support inference from RGBD images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Beyond the line of sight: labeling the underlying surfaces

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Efficient exact inference for 3d indoor scene understanding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Combining monocular geometric cues with traditional stereo cues for consumer camera stereo

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Semantic road segmentation via multi-scale ensembles of learned features

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Introduction to the special issue on learning semantics

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since most current scene understanding approaches operate either on the 2D image or using a surface-based representation, they do not allow reasoning about the physical constraints within the 3D scene. Inspired by the "Blocks World" work in the 1960's, we present a qualitative physical representation of an outdoor scene where objects have volume and mass, and relationships describe 3D structure and mechanical configurations. Our representation allows us to apply powerful global geometric constraints between 3D volumes as well as the laws of statics in a qualitative manner. We also present a novel iterative "interpretation-by-synthesis" approach where, starting from an empty ground plane, we progressively "build up" a physically-plausible 3D interpretation of the image. For surface layout estimation, our method demonstrates an improvement in performance over the state-of-the-art [9]. But more importantly, our approach automatically generates 3D parse graphs which describe qualitative geometric and mechanical properties of objects and relationships between objects within an image.