Make3D: Learning 3D Scene Structure from a Single Still Image

Authors:
Ashutosh Saxena;Min Sun;Andrew Y. Ng
Affiliations:
Stanford University, CA;Princeton University, NJ;Stanford University, CA
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2009

Citing 0
Cited 52

A sketch-based interface for photo pop-up

Proceedings of the 6th Eurographics Symposium on Sketch-Based Interfaces and Modeling
Image-based street-side city modeling

ACM SIGGRAPH Asia 2009 papers
Object Surface Reconstruction from One Camera System

FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology
Image warps for artistic perspective manipulation

ACM SIGGRAPH 2010 papers
A cognitive approach for effective coding and transmission of 3D video

Proceedings of the international conference on Multimedia
Scene carving: scene consistent image retargeting

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Blocks world revisited: image understanding using qualitative geometry and mechanics

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Occlusion boundary detection using pseudo-depth

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
A dynamic programming approach to reconstructing building interiors

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
A close-form iterative algorithm for depth inferring from a single image

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Thinking inside the box: using appearance models and context based on room geometry

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
A smartphone-based obstacle sensor for the visually impaired

UIC'10 Proceedings of the 7th international conference on Ubiquitous intelligence and computing
Context modeling in computer vision: techniques, implications, and applications

Multimedia Tools and Applications
Retrieving images of similar geometrical configuration

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Learning non-coplanar scene models by exploring the height variation of tracked objects

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Design your room: adding virtual objects to a real indoor scenario

CHI '11 Extended Abstracts on Human Factors in Computing Systems
Toward coherent object detection and scene layout understanding

Image and Vision Computing
A cognitive approach for effective coding and transmission of 3D video

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) - Special section on ACM multimedia 2010 best paper candidates, and issue on social media
High-quality tactile paintings

Journal on Computing and Cultural Heritage (JOCCH)
On consistent inter-view synthesis for autostereoscopic displays

3D Research
3D modeling from multiple images

ISNN'10 Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part II
Smart interface for reshaping photos in 3D

Proceedings of the 2012 ACM international conference on Intelligent User Interfaces
Recovering depth map from video with moving objects

PSIVT'11 Proceedings of the 5th Pacific Rim conference on Advances in Image and Video Technology - Volume Part II
Interactive images: cuboid proxies for smart image manipulation

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
3D reconstruction of polyhedral objects from single perspective projections using cubic corner

3D Research
Micro perceptual human computation for visual tasks

ACM Transactions on Graphics (TOG)
Real-time estimation of 3D scene geometry from a single image

Pattern Recognition
Magnetic augmented reality: virtual objects in your space

Proceedings of the International Working Conference on Advanced Visual Interfaces
Which facial profile do humans expect after seeing a frontal view? a comparison with a linear face model

ACM Transactions on Applied Perception (TAP)
Learning to place new objects in a scene

International Journal of Robotics Research
Enabling warping on stereoscopic images

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Object Detection using Geometrical Context Feedback

International Journal of Computer Vision
An interactive system of stereoscopic video conversion

Proceedings of the 20th ACM international conference on Multimedia
3D multimedia signal processing

Proceedings of the 20th ACM international conference on Multimedia
Spreading algorithm for efficient vegetation detection in cluttered outdoor environments

Robotics and Autonomous Systems
Patch based synthesis for single depth image super-resolution

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Shapecollage: occlusion-aware, example-based shape interpretation

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Depth extraction from video using non-parametric sampling

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Shape from angle regularity

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Efficient exact inference for 3d indoor scene understanding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Road scene segmentation from a single image

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VII
Semantic structure from motion: a novel framework for joint object recognition and 3d reconstruction

Proceedings of the 15th international conference on Theoretical Foundations of Computer Vision: outdoor and large-scale real-world scene analysis
Combining monocular geometric cues with traditional stereo cues for consumer camera stereo

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Multiple ground plane estimation for 3D scene understanding using a monocular camera

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
A generic model to compose vision modules for holistic scene understanding

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
A brain informatics approach to explain the oblique effect via depth statistics

BI'12 Proceedings of the 2012 international conference on Brain Informatics
An object expression system using depth-maps

Multimedia Tools and Applications
Dynamic objects effect on visibility analysis in 3d urban environments

W2GIS'13 Proceedings of the 12th international conference on Web and Wireless Geographical Information Systems
Contextually guided semantic labeling and search for three-dimensional point clouds

International Journal of Robotics Research
Depth recovery from a single defocused image based on depth locally consistency

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Object detection, shape recovery, and 3D modelling by depth-encoded hough voting

Computer Vision and Image Understanding
Real-Time 3D Depth Generation for Stereoscopic Video Applications with Thread-Level Superscalar-Pipeline Parallelization

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.14

Visualization

Abstract

We consider the problem of estimating detailed 3D structure from a single still image of an unstructured environment. Our goal is to create 3D models that are both quantitatively accurate as well as visually pleasing. For each small homogeneous patch in the image, we use a Markov Random Field (MRF) to infer a set of "plane parameters” that capture both the 3D location and 3D orientation of the patch. The MRF, trained via supervised learning, models both image depth cues as well as the relationships between different parts of the image. Other than assuming that the environment is made up of a number of small planes, our model makes no explicit assumptions about the structure of the scene; this enables the algorithm to capture much more detailed 3D structure than does prior art and also give a much richer experience in the 3D flythroughs created using image-based rendering, even for scenes with significant nonvertical structure. Using this approach, we have created qualitatively correct 3D models for 64.9 percent of 588 images downloaded from the Internet. We have also extended our model to produce large-scale 3D models from a few images.