Joint spatio-temporal depth features fusion framework for 3d structure estimation in urban environment

Authors:
Mohamad Motasem Nawaf;Alain Trémeau
Affiliations:
Laboratoire Hubert Curien UMR CNRS 5516, Université Jean Monnet, Saint-Etienne, France;Laboratoire Hubert Curien UMR CNRS 5516, Université Jean Monnet, Saint-Etienne, France
Venue:
ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part III
Year:
2012

Citing 12
Cited 0

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Depth Estimation from Image Structure

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bundle Adjustment - A Modern Synthesis

ICCV '99 Proceedings of the International Workshop on Vision Algorithms: Theory and Practice
Multiple View Geometry in Computer Vision

Multiple View Geometry in Computer Vision
Efficient Graph-Based Image Segmentation

International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Automatic photo pop-up

ACM SIGGRAPH 2005 Papers
Recovering Surface Layout from an Image

International Journal of Computer Vision
3-D Depth Reconstruction from a Single Still Image

International Journal of Computer Vision
Structure from Motion using the Extended Kalman Filter

Structure from Motion using the Extended Kalman Filter
Learning to find occlusion regions

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Semantic structure from motion

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach to improve 3D structure estimation from an image stream in urban scenes. We consider a particular setup where the camera is installed on a moving vehicle. Applying traditional structure from motion (SfM) technique in this case generates poor estimation of the 3d structure due to several reasons such as texture-less images, small baseline variations and dominant forward camera motion. Our idea is to introduce the monocular depth cues that exist in a single image, and add time constraints on the estimated 3D structure. We assume that our scene is made up of small planar patches which are obtained using over-segmentation method, and our goal is to estimate the 3D positioning for each of these planes. We propose a fusion framework that employs Markov Random Field (MRF) model to integrate both spatial and temporal depth information. An advantage of our model is that it performs well even in the absence of some depth information. Spatial depth information is obtained through a global and local feature extraction method inspired by Saxena et al. [1]. Temporal depth information is obtained via sparse optical flow based structure from motion approach. That allows decreasing the estimation ambiguity by forcing some constraints on camera motion. Finally, we apply a fusion scheme to create unique 3D structure estimation.