Discriminative learning with latent variables for cluttered indoor scene understanding

Authors:
Huayan Wang;Stephen Gould;Daphne Koller
Affiliations:
Computer Science Department, Stanford University, CA;Electrical Engineering Department, Stanford Univeristy, CA;Computer Science Department, Stanford University, CA
Venue:
ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Year:
2010

Citing 8
Cited 3

Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Large Margin Methods for Structured and Interdependent Output Variables

The Journal of Machine Learning Research
Recovering Surface Layout from an Image

International Journal of Computer Vision
Learning Spatial Context: Using Stuff to Find Things

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context

International Journal of Computer Vision
Learning structural SVMs with latent variables

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Cutting-plane training of structural SVMs

Machine Learning
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence

Efficient exact inference for 3d indoor scene understanding

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Human-centric indoor environment modeling from depth videos

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume 2
Discriminative learning with latent variables for cluttered indoor scene understanding

Communications of the ACM

Quantified Score

Hi-index	0.02

Visualization

Abstract

We address the problem of understanding an indoor scene from a single image in terms of recovering the layouts of the faces (floor, ceiling, walls) and furniture. A major challenge of this task arises from the fact that most indoor scenes are cluttered by furniture and decorations, whose appearances vary drastically across scenes, and can hardly be modeled (or even hand-labeled) consistently. In this paper we tackle this problem by introducing latent variables to account for clutters, so that the observed image is jointly explained by the face and clutter layouts. Model parameters are learned in the maximum margin formulation, which is constrained by extra prior energy terms that define the role of the latent variables. Our approach enables taking into account and inferring indoor clutter layouts without hand-labeling of the clutters in the training set. Yet it outperforms the state-of-the-art method of Hedau et al. [4] that requires clutter labels.