Harvesting Image Databases from the Web

Authors:
F. Schroff;A. Criminisi;A. Zisserman
Affiliations:
University of California, San Diego, San Diego;Microsoft Research Cambridge, Cambridge;University of Oxford, Oxford
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2011

Citing 0
Cited 10

Social negative bootstrapping for visual categorization

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Retrieving and ranking unannotated images through collaboratively mining online search results

Proceedings of the 20th ACM international conference on Information and knowledge management
In-video product annotation with web information mining

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Online non-feedback image re-ranking via dominant data selection

Proceedings of the 20th ACM international conference on Multimedia
Joint image and word sense discrimination for image retrieval

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Content-Based re-ranking of text-based image search results

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Learning realistic facial expressions from web images

Pattern Recognition
Robust multiple-instance learning with superbags

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
VISOR: towards on-the-fly large-scale object category retrieval

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Automatic extraction of relevant video shots of specific actions exploiting Web data

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.14

Visualization

Abstract

The objective of this work is to automatically generate a large number of images for a specified object class. A multimodal approach employing both text, metadata, and visual features is used to gather many high-quality images from the Web. Candidate images are obtained by a text-based Web search querying on the object identifier (e.g., the word penguin). The Webpages and the images they contain are downloaded. The task is then to remove irrelevant images and rerank the remainder. First, the images are reranked based on the text surrounding the image and metadata features. A number of methods are compared for this reranking. Second, the top-ranked images are used as (noisy) training data and an SVM visual classifier is learned to improve the ranking further. We investigate the sensitivity of the cross-validation procedure to this noisy training data. The principal novelty of the overall method is in combining text/metadata and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals, vehicles, and other classes, totaling 18 classes. The results are assessed by precision/recall curves on ground-truth annotated data and by comparison to previous approaches, including those of Berg and Forsyth [CHECK END OF SENTENCE] and Fergus et al. [CHECK END OF SENTENCE].