3-D Depth Reconstruction from a Single Still Image

Authors:
Ashutosh Saxena;Sung H. Chung;Andrew Y. Ng
Affiliations:
Computer Science Department, Stanford University, Stanford, USA 94305;Computer Science Department, Stanford University, Stanford, USA 94305;Computer Science Department, Stanford University, Stanford, USA 94305
Venue:
International Journal of Computer Vision
Year:
2008

Citing 28
Cited 36

Performance of optical flow techniques

International Journal of Computer Vision
Computing Local Surface Orientation and Shape from Texture forCurved Surfaces

International Journal of Computer Vision
Shape from Shading: A Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Single View Metrology

International Journal of Computer Vision
Computer Vision: A Modern Approach

Computer Vision: A Modern Approach
A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms

International Journal of Computer Vision
Geotensity: Combining Motion and Lighting for 3D Surface Reconstruction

International Journal of Computer Vision
Performance Analysis of Stereo, Vergence, and Focus as Depth Cues for Active Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Depth Estimation from Image Structure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Advances in Computational Stereo

IEEE Transactions on Pattern Analysis and Machine Intelligence
Face recognition: A literature survey

ACM Computing Surveys (CSUR)
A SIFT Descriptor with Global Context

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Object Recognition with Features Inspired by Visual Cortex

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Example-Based Photometric Stereo: Shape Reconstruction with General, Varying BRDFs

IEEE Transactions on Pattern Analysis and Machine Intelligence
SCAPE: shape completion and animation of people

ACM SIGGRAPH 2005 Papers
Automatic photo pop-up

ACM SIGGRAPH 2005 Papers
Geometric Context from a Single Image

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Shape from Symmetry

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
High speed obstacle avoidance using monocular vision and reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Depth from Familiar Objects: A Hierarchical Model for 3D Scenes

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
A Dynamic Bayesian Network Model for Autonomous 3D Reconstruction from a Single Indoor Image

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Probabilistic Fusion of Stereo with Color and Contrast for Bilayer Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Depth estimation using monocular and stereo cues

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Multiscale conditional random fields for image labeling

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
High-accuracy stereo depth maps using structured light

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Shedding light on the weather

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Robotic Grasping of Novel Objects using Vision

International Journal of Robotics Research
Generalization performance of vision based controllers for mobile robots evolved with genetic programming

Proceedings of the 10th annual conference on Genetic and evolutionary computation
What can we learn about the scene structure from three orthogonal vanishing points in images

Pattern Recognition Letters
Sketch2Photo: internet image montage

ACM SIGGRAPH Asia 2009 papers
Make3D: depth perception from a single still image

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Adaptive 3-D scene construction from single image using extended object placement relation

Proceedings of the 8th International Conference on Virtual Reality Continuum and its Applications in Industry
Object Surface Reconstruction from One Camera System

FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology
Accurate 3D ground plane estimation from a single image

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Autonomous altitude estimation of a UAV using a single onboard camera

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A nonparametric learning approach to range sensing from omnidirectional vision

Robotics and Autonomous Systems
Simultaneous segmentation and figure/ground organization using angular embedding

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
A close-form iterative algorithm for depth inferring from a single image

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Inferring 3D shapes and deformations from single views

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Recovering Occlusion Boundaries from an Image

International Journal of Computer Vision
VG-RAM WNN approach to monocular depth perception

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
PixelLaser: computing range from monocular texture

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part III
Single and sparse view 3D reconstruction by learning shape priors

Computer Vision and Image Understanding
Automatic occlusion removal from facades for 3D urban reconstruction

ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
Combining plane estimation with shape detection for holistic scene understanding

ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
3D reconstruction of a collapsed historical site from sparse set of photographs and photogrammetric map

ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume part II
Improved feature extraction and matching in urban environments based on 3D viewpoint normalization

Computer Vision and Image Understanding
Binocular stereopsis of traditional Chinese paintings

Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
Learning to place new objects in a scene

International Journal of Robotics Research
Efficient exact inference for 3d indoor scene understanding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Monocular depth from motion using a new closed-form solution

ICIRA'12 Proceedings of the 5th international conference on Intelligent Robotics and Applications - Volume Part III
Joint spatio-temporal depth features fusion framework for 3d structure estimation in urban environment

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Depth from images of external outdoor scenes

Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing
An image representation method based on retina mechanism for the promotion of SIFT and segmentation

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part V
A generic model to compose vision modules for holistic scene understanding

ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
An object expression system using depth-maps

Multimedia Tools and Applications
Contextually guided semantic labeling and search for three-dimensional point clouds

International Journal of Robotics Research
Visual saliency detection based on photographic composition

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Depth recovery from a single defocused image based on depth locally consistency

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Recovering Relative Depth from Low-Level Features Without Explicit T-junction Detection and Interpretation

International Journal of Computer Vision
Stereo/multiview picture quality: Overview and recent advances

Image Communication
Variational Recursive Joint Estimation of Dense Scene Structure and Camera Motion from Monocular High Speed Traffic Sequences

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the task of 3-d depth estimation from a single still image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured indoor and outdoor environments which include forests, sidewalks, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the value of the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a hierarchical, multiscale Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models the depths and the relation between depths at different points in the image. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. We further propose a model that incorporates both monocular cues and stereo (triangulation) cues, to obtain significantly more accurate depth estimates than is possible using either monocular or stereo cues alone.