The nature of statistical learning theory
The nature of statistical learning theory
WordNet: a lexical database for English
Communications of the ACM
Machine Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Content-Based Image Retrieval at the End of the Early Years
IEEE Transactions on Pattern Analysis and Machine Intelligence
A vector space model for automatic indexing
Communications of the ACM
Use of the Hough transformation to detect lines and curves in pictures
Communications of the ACM
Semantics in Visual Information Retrieval
IEEE MultiMedia
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
IEEE Transactions on Pattern Analysis and Machine Intelligence
The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
International Journal of Computer Vision - Special Issue on Content-Based Image Retrieval
Robust Real-Time Face Detection
International Journal of Computer Vision
Scale & Affine Invariant Interest Point Detectors
International Journal of Computer Vision
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Optimal multimodal fusion for multimedia data analysis
Proceedings of the 12th annual ACM international conference on Multimedia
Distributional term representations: an experimental comparison
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
A Bayesian Hierarchical Model for Learning Natural Scene Categories
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Antonymy and conceptual vectors
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Early versus late fusion in semantic video analysis
Proceedings of the 13th annual ACM international conference on Multimedia
Content-based image retrieval: approaches and trends of the new age
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Content-based multimedia information retrieval: State of the art and challenges
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
The Design of High-Level Features for Photo Quality Assessment
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 1
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
The mediamill large.lexicon concept suggestion engine
MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Using bag-of-concepts to improve the performance of support vector machines in text categorization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
International Journal of Computer Vision
A probabilistic multimedia retrieval model and its evaluation
EURASIP Journal on Applied Signal Processing
The MIR flickr retrieval evaluation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Crossing textual and visual content in different application scenarios
Multimedia Tools and Applications
Line segment based edge feature using Hough transform
VIIP '07 The Seventh IASTED International Conference on Visualization, Imaging and Image Processing
Artificial Intelligence Review
New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative
Proceedings of the international conference on Multimedia information retrieval
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo
IEEE Transactions on Pattern Analysis and Machine Intelligence
Integrating structure and meaning: a new method for encoding structure for text classification
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Evaluating Color Descriptors for Object and Scene Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Affective image classification using features inspired by psychology and art theory
Proceedings of the international conference on Multimedia
Multi-scale Color Local Binary Patterns for Visual Object Classes Recognition
ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Associating textual features with visual ones to improve affective image classification
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part I
Multimodal data fusion for video scene segmentation
VISUAL'05 Proceedings of the 8th international conference on Visual Information and Information Systems
Multimodal indexing based on semantic cohesion for image retrieval
Information Retrieval
Visual object recognition using DAISY descriptor
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Hi-index | 0.00 |
The text associated with images provides valuable semantic meanings about image content that can hardly be described by low-level visual features. In this paper, we propose a novel multimodal approach to automatically predict the visual concepts of images through an effective fusion of textual features along with visual ones. In contrast to the classical Bag-of-Words approach which simply relies on term frequencies, we propose a novel textual descriptor, namely the Histogram of Textual Concepts (HTC), which accounts for the relatedness of semantic concepts in accumulating the contributions of words from the image caption toward a dictionary. In addition to the popular SIFT-like features, we also evaluate a set of mid-level visual features, aiming at characterizing the harmony, dynamism and aesthetic quality of visual content, in relationship with affective concepts. Finally, a novel selective weighted late fusion (SWLF) scheme is proposed to automatically select and weight the scores from the best features according to the concept to be classified. This scheme proves particularly useful for the image annotation task with a multi-label scenario. Extensive experiments were carried out on the MIR FLICKR image collection within the ImageCLEF 2011 photo annotation challenge. Our best model, which is a late fusion of textual and visual features, achieved a MiAP (Mean interpolated Average Precision) of 43.69% and ranked 2nd out of 79 runs. We also provide comprehensive analysis of the experimental results and give some insights for future improvements.