VISOR: towards on-the-fly large-scale object category retrieval

Authors:
Ken Chatfield;Andrew Zisserman
Affiliations:
University of Oxford, United Kingdom;University of Oxford, United Kingdom
Venue:
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Year:
2012

Citing 21
Cited 2

Exploiting generative models in discriminative classifiers

Proceedings of the 1998 conference on Advances in neural information processing systems II
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Web Image Retrieval Re-Ranking with Relevance Model

WI '03 Proceedings of the 2003 IEEE/WIC International Conference on Web Intelligence
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Animals on the Web

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Scene Classification Using a Hybrid Generative/Discriminative Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
Improving Bag-of-Features for Large Scale Image Search

International Journal of Computer Vision
The Pascal Visual Object Classes (VOC) Challenge

International Journal of Computer Vision
Efficient object category recognition using classemes

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part I
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Product Quantization for Nearest Neighbor Search

IEEE Transactions on Pattern Analysis and Machine Intelligence
Harvesting Image Databases from the Web

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Additive Kernels via Explicit Feature Maps

IEEE Transactions on Pattern Analysis and Machine Intelligence
High-dimensional signature compression for large-scale image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Towards good practice in large-scale learning for image classification

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Scalable object-class retrieval with approximate and top-k ranking

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

The AXES PRO video search system

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Linking inside a video collection: what and how to measure?

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the problem of object category retrieval in large unannotated image datasets. Our aim is to enable both fast learning of an object category model, and fast retrieval over the dataset. With these elements we show that new visual concepts can be learnt on-the-fly, given a text description, and so images of that category can then be retrieved from the dataset in realtime. To this end we compare state of the art encoding methods and introduce a novel cascade retrieval architecture, with a focus on achieving the best trade-off between three important performance measures for a realtime system of this kind, namely: (i) class accuracy, (ii) memory footprint, and (iii) speed. We show that an on-the-fly system is possible and compare its performance (using noisy training images) to that of using carefully curated images. For this evaluation we use the VOC 2007 dataset together with 100k images from ImageNet to act as distractors.