Toward coherent object detection and scene layout understanding

Authors:
Sid Yingze Bao;Min Sun;Silvio Savarese
Affiliations:
-;-;-
Venue:
Image and Vision Computing
Year:
2011

Citing 11
Cited 0

Geometric Context from a Single Image

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Depth from Familiar Objects: A Hierarchical Model for 3D Scenes

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Groups of Adjacent Contour Segments for Object Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
3D Urban Scene Modeling Integrating Recognition and Reconstruction

International Journal of Computer Vision
Segmentation and Recognition Using Structure from Motion Point Clouds

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Make3D: Learning 3D Scene Structure from a Single Still Image

IEEE Transactions on Pattern Analysis and Machine Intelligence
Blocks world revisited: image understanding using qualitative geometry and mechanics

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Depth-encoded hough voting for joint object detection and shape recovery

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
Thinking inside the box: using appearance models and context based on room geometry

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI

Quantified Score

Hi-index	0.01

Visualization

Abstract

Detecting objects in complex scenes while recovering the scene layout is a critical functionality in many vision-based applications. In this work, we advocate the importance of geometric contextual reasoning for object recognition. We start from the intuition that objects' location and pose in the 3D space are not arbitrarily distributed but rather constrained by the fact that objects must lie on one or multiple supporting surfaces. We model such supporting surfaces by means of hidden parameters (i.e. not explicitly observed) and formulate the problem of joint scene reconstruction and object recognition as the one of finding the set of parameters that maximizes the joint probability of having a number of detected objects on K supporting planes given the observations. As a key ingredient for solving this optimization problem, we have demonstrated a novel relationship between object location and pose in the image, and the scene layout parameters (i.e. normal of one or more supporting planes in 3D and camera pose, location and focal length). Using a novel probabilistic formulation and the above relationship our method has the unique ability to jointly: i) reduce false alarm and false negative object detection rate; ii) recover object location and supporting planes within the 3D camera reference system; iii) infer camera parameters (view point and the focal length) from just one single uncalibrated image. Quantitative and qualitative experimental evaluation on two datasets (desk-top dataset [1] and LabelMe [2]) demonstrates our theoretical claims.