Learning hybrid part filters for scene recognition

Authors:
Yingbin Zheng;Yu-Gang Jiang;Xiangyang Xue
Affiliations:
School of Computer Science, Fudan University, Shanghai, China;School of Computer Science, Fudan University, Shanghai, China;School of Computer Science, Fudan University, Shanghai, China
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Year:
2012

Citing 18
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Semantic Modeling of Natural Scenes for Content-Based Image Retrieval

International Journal of Computer Vision
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Object Detection with Discriminatively Trained Part-Based Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automatic attribute discovery and characterization from noisy web data

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Efficient object category recognition using classemes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Attribute-based transfer learning for object categorization with zero/one training example

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
A discriminative latent model of object classes and attributes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part V
CENTRIST: A Visual Descriptor for Scene Categorization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

IEEE Transactions on Multimedia
Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study

IEEE Transactions on Multimedia
Scene recognition and weakly supervised object localization with deformable part-based models

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Relative attributes

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a new image representation for scene recognition, where an image is described based on the response maps of object part filters. The part filters are learned from existing datasets with object location annotations, using deformable part-based models trained by latent SVM [1]. Since different objects may contain similar parts, we describe a method that uses a semantic hierarchy to automatically determine and merge filters shared by multiple objects. The merged hybrid filters are then applied to new images. Our proposed representation, called Hybrid-Parts, is generated by pooling the response maps of the hybrid filters. Contrast to previous scene recognition approaches that adopted object-level detections as feature inputs, we harness filter responses of object parts, which enable a richer and finer-grained representation. The use of the hybrid filters is important towards a more compact representation, compared to directly using all the original part filters. Through extensive experiments on several scene recognition benchmarks, we demonstrate that Hybrid-Parts outperforms recent state-of-the-arts, and combining it with standard low-level features such as the GIST descriptor can lead to further improvements.