Thinking inside the box: using appearance models and context based on room geometry

Authors:
Varsha Hedau;Derek Hoiem;David Forsyth
Affiliations:
Department of Electrical and Computer Engineering;Department of Computer Science, University of Illinois at Urbana Champaign;Department of Computer Science, University of Illinois at Urbana Champaign
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Year:
2010

Citing 14
Cited 15

Neural Network-Based Face Detection

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Example Based Learning for View-Based Human Face Detection

Example Based Learning for View-Based Human Face Detection
Robust Real-Time Face Detection

International Journal of Computer Vision
Image Parsing: Unifying Segmentation, Detection, and Recognition

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Depth from Familiar Objects: A Hierarchical Model for 3D Scenes

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Recovering Surface Layout from an Image

International Journal of Computer Vision
LabelMe: A Database and Web-Based Tool for Image Annotation

International Journal of Computer Vision
Coupled Object Detection and Tracking from Static Cameras and Moving Vehicles

IEEE Transactions on Pattern Analysis and Machine Intelligence
Fast Automatic Single-View 3-d Reconstruction of Urban Scenes

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part II
Make3D: Learning 3D Scene Structure from a Single Still Image

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Blocks world revisited: image understanding using qualitative geometry and mechanics

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV

Blocks world revisited: image understanding using qualitative geometry and mechanics

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Context modeling in computer vision: techniques, implications, and applications

Multimedia Tools and Applications
Toward coherent object detection and scene layout understanding

Image and Vision Computing
Interactive images: cuboid proxies for smart image manipulation

ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
Geometric Image Parsing in Man-Made Environments

International Journal of Computer Vision
A search-classify approach for cluttered indoor scene understanding

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
People watching: human actions as a cue for single view geometry

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Indoor segmentation and support inference from RGBD images

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Scene semantics from long-term observation of people

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Efficient exact inference for 3d indoor scene understanding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Human-centric indoor environment modeling from depth videos

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Superparsing

International Journal of Computer Vision
Discriminative learning with latent variables for cluttered indoor scene understanding

Communications of the ACM
Contextually guided semantic labeling and search for three-dimensional point clouds

International Journal of Robotics Research
Object detection, shape recovery, and 3D modelling by depth-encoded hough voting

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.02

Visualization

Abstract

In this paper we show that a geometric representation of an object occurring in indoor scenes, along with rich scene structure can be used to produce a detector for that object in a single image. Using perspective cues from the global scene geometry, we first develop a 3D based object detector. This detector is competitive with an image based detector built using state-of-the-art methods; however, combining the two produces a notably improved detector, because it unifies contextual and geometric information. We then use a probabilistic model that explicitly uses constraints imposed by spatial layout - the locations of walls and floor in the image - to refine the 3D object estimates. We use an existing approach to compute spatial layout [1], and use constraints such as objects are supported by floor and can not stick through the walls. The resulting detector (a) has significantly improved accuracy when compared to the state-of-the-art 2D detectors and (b) gives a 3D interpretation of the location of the object, derived from a 2D image. We evaluate the detector on beds, for which we give extensive quantitative results derived from images of real scenes.