Original Contribution: Stacked generalization
Neural Networks
Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
International Journal of Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Verbs semantics and lexical selection
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
The MIR flickr retrieval evaluation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-modal visual concept classification of images via Markov random walk over tags
WACV '11 Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)
Social media driven image retrieval
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Salient coding for image classification
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
In defense of soft-assignment coding
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Tag completion based on belief theory and neighbor voting
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Hi-index | 0.00 |
The automatic attribution of semantic labels to unlabeled or weakly labeled images has received considerable attention but, given the complexity of the problem, remains a hard research topic. Here we propose a unified classification framework which mixes textual and visual information in a seamless manner. Unlike most recent previous works, computer vision techniques are used as inspiration to process textual information. To do so, we consider two types of complementary tag similarities, respectively computed from a conceptual hierarchy and from data collected from a photo sharing platform. Visual content is processed using recent techniques for bag-of visual-words feature generation. A central contribution of our work is to infer the coding step of the general bag-of-word framework with such similarities and to aggregate these tag-codes by max-pooling to obtain a single representative vector (signature). Final image annotations are obtained via late fusion, where the three modalities (two text-based and one visual-based) are merged during the classification step. Experimental results on the Pascal VOC 2007 and MIR Flickr datasets show an improvement over the state-of-the-art methods, while significantly decreasing the computational complexity of the learning system.