Social negative bootstrapping for visual categorization
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Retrieving and ranking unannotated images through collaboratively mining online search results
Proceedings of the 20th ACM international conference on Information and knowledge management
In-video product annotation with web information mining
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Online non-feedback image re-ranking via dominant data selection
Proceedings of the 20th ACM international conference on Multimedia
Joint image and word sense discrimination for image retrieval
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I
Content-Based re-ranking of text-based image search results
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Learning realistic facial expressions from web images
Pattern Recognition
Robust multiple-instance learning with superbags
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
VISOR: towards on-the-fly large-scale object category retrieval
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Automatic extraction of relevant video shots of specific actions exploiting Web data
Computer Vision and Image Understanding
Hi-index | 0.14 |
The objective of this work is to automatically generate a large number of images for a specified object class. A multimodal approach employing both text, metadata, and visual features is used to gather many high-quality images from the Web. Candidate images are obtained by a text-based Web search querying on the object identifier (e.g., the word penguin). The Webpages and the images they contain are downloaded. The task is then to remove irrelevant images and rerank the remainder. First, the images are reranked based on the text surrounding the image and metadata features. A number of methods are compared for this reranking. Second, the top-ranked images are used as (noisy) training data and an SVM visual classifier is learned to improve the ranking further. We investigate the sensitivity of the cross-validation procedure to this noisy training data. The principal novelty of the overall method is in combining text/metadata and visual features in order to achieve a completely automatic ranking of the images. Examples are given for a selection of animals, vehicles, and other classes, totaling 18 classes. The results are assessed by precision/recall curves on ground-truth annotated data and by comparison to previous approaches, including those of Berg and Forsyth [CHECK END OF SENTENCE] and Fergus et al. [CHECK END OF SENTENCE].