A generic model to compose vision modules for holistic scene understanding

Authors:
Congcong Li;Adarsh Kowdle;Ashutosh Saxena;Tsuhan Chen
Affiliations:
School of Electrical & Computer Engineering, Cornell University;School of Electrical & Computer Engineering, Cornell University;Department of Computer Science, Cornell University;School of Electrical & Computer Engineering, Cornell University
Venue:
ECCV'10 Proceedings of the 11th European conference on Trends and Topics in Computer Vision - Volume Part I
Year:
2010

Citing 12
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Image Parsing: Unifying Segmentation, Detection, and Recognition

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A Hierarchical Field Framework for Unified Context-Based Classification

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Learning Hierarchical Models of Scenes, Objects, and Parts

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Putting Objects in Perspective

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Depth from Familiar Objects: A Hierarchical Model for 3D Scenes

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
3-D Depth Reconstruction from a Single Still Image

International Journal of Computer Vision
Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part III
Make3D: Learning 3D Scene Structure from a Single Still Image

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Variational Gaussian process classifiers

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of holistic scene understanding involves many vision tasks such as depth estimation, scene categorization, event categorization, etc. Each of these tasks explores some aspects of the scene but, these tasks are related in that, they represent attributes of the same scene. An intuition is that one task can provide meaningful attributes to aid the learning process of another task. In this work, we propose a generic model (together with learning and inference techniques) for connecting different vision tasks in the form of a 2-layer cascade. Our model considers the first layer as a hidden layer, where the latent variables are inferred by feedback from the second layer. The feedback mechanism allows the first layer classifiers to focus on more important image modes, and draws their output towards "attributes" rather than the original "labels". Our model also automatically discovers sparse connections between the learned attributes on the first layer and the target task on the second layer. Note that in our model, the same vision tasks can act as attribute learners as well as target tasks, while being set up on different layers. In extensive experiments, we show that the same proposed model improves the performance in all the tasks we consider: single image depth estimation, scene categorization, saliency detection and event categorization.