Ensemble of exemplar-SVMs for object detection and beyond

Authors:
Tomasz Malisiewicz;Abhinav Gupta;Alexei A. Efros
Affiliations:
Carnegie Mellon University, USA;Carnegie Mellon University, USA;Carnegie Mellon University, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 20

Data-driven visual similarity for cross-domain image matching

Proceedings of the 2011 SIGGRAPH Asia Conference
Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning

Foundations and Trends® in Computer Graphics and Vision
Automatically characterizing places with opportunistic crowdsensing using smartphones

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
Latent hough transform for object detection

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Ensemble partitioning for unsupervised image categorization

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part III
Segmentation over detection by coupled global and local sparse representations

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Mixture component identification and learning for visual recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Learning to match images in large-scale collections

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Action recognition with exemplar based 2.5d graph matching

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Multi-component models for object detection

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Discriminative decorrelation for clustering and classification

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Superparsing

International Journal of Computer Vision
Data decomposition and spatial mixture modeling for part based model

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part I
Understanding the coverage and scalability of place-centric crowdsensing

Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing
Beyond bag of words: image representation in sub-semantic space

Proceedings of the 21st ACM international conference on Multimedia
Querying for video events by semantic signatures from few examples

Proceedings of the 21st ACM international conference on Multimedia
Hand segmentation for gesture recognition in EGO-vision

Proceedings of the 3rd ACM international workshop on Interactive multimedia on mobile & portable devices
Integrated instance- and class-based generative modeling for text classification

Proceedings of the 18th Australasian Document Computing Symposium
Multi-Max-Margin Support Vector Machine for multi-source human action recognition

Neurocomputing
Boosting masked dominant orientation templates for efficient object detection

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.