Reconfigurable models for scene recognition

Authors:
Sobhan Naderi Parizi
Affiliations:
School of Engineering, Brown University
Venue:
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Year:
2012

Citing 0
Cited 5

Object detection using strongly-supervised deformable part models

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Mixture component identification and learning for visual recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Spring lattice counting grids: scene recognition using deformable positional constraints

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Hierarchical space tiling for scene modeling

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
GIANT: geo-informative attributes for location recognition and exploration

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new latent variable model for scene recognition. Our approach represents a scene as a collection of region models (“parts”) arranged in a reconfigurable pattern. We partition an image into a predefined set of regions and use a latent variable to specify which region model is assigned to each image region. In our current implementation we use a bag of words representation to capture the appearance of an image region. The resulting method generalizes a spatial bag of words approach that relies on a fixed model for the bag of words in each image region. Our models can be trained using both generative and discriminative methods. In the generative setting we use the Expectation-Maximization (EM) algorithm to estimate model parameters from a collection of images with category labels. In the discriminative setting we use a latent structural SVM (LSSVM). We note that LSSVMs can be very sensitive to initialization and demonstrate that generative training with EM provides a good initialization for discriminative training with LSSVM.