Multimodal feature generation framework for semantic image classification

Authors:
Amel Znaidia;Aymen Shabou;Adrian Popescu;Hervé le Borgne;Céline Hudelot
Affiliations:
CEA, LIST, Vision & Content Engineering Laboratory, Gif-sur-Yvettes, France;CEA, LIST, Vision & Content Engineering Laboratory, Gif-sur-Yvettes, France;CEA, LIST, Vision & Content Engineering Laboratory, Gif-sur-Yvettes, France;CEA, LIST, Vision & Content Engineering Laboratory, Gif-sur-Yvettes, France;Applied Mathematics & Systems Laboratory, Antony, France
Venue:
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Year:
2012

Citing 14
Cited 1

Original Contribution: Stacked generalization

Neural Networks
Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
The MIR flickr retrieval evaluation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Visual Word Ambiguity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-modal visual concept classification of images via Markov random walk over tags

WACV '11 Proceedings of the 2011 IEEE Workshop on Applications of Computer Vision (WACV)
Social media driven image retrieval

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Salient coding for image classification

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
In defense of soft-assignment coding

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Tag completion based on belief theory and neighbor voting

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic attribution of semantic labels to unlabeled or weakly labeled images has received considerable attention but, given the complexity of the problem, remains a hard research topic. Here we propose a unified classification framework which mixes textual and visual information in a seamless manner. Unlike most recent previous works, computer vision techniques are used as inspiration to process textual information. To do so, we consider two types of complementary tag similarities, respectively computed from a conceptual hierarchy and from data collected from a photo sharing platform. Visual content is processed using recent techniques for bag-of visual-words feature generation. A central contribution of our work is to infer the coding step of the general bag-of-word framework with such similarities and to aggregate these tag-codes by max-pooling to obtain a single representative vector (signature). Final image annotations are obtained via late fusion, where the three modalities (two text-based and one visual-based) are merged during the classification step. Experimental results on the Pascal VOC 2007 and MIR Flickr datasets show an improvement over the state-of-the-art methods, while significantly decreasing the computational complexity of the learning system.