Semantic combination of textual and visual information in multimedia retrieval

Authors:
Stéphane Clinchant;Julien Ah-Pine;Gabriela Csurka
Affiliations:
Xerox Research Centre Europe, chemin de Maupertuis, Meylan, France;University of Lyon, avenue Pierre Mendès France, Bron, France;Xerox Research Centre Europe, chemin de Maupertuis, Meylan, France
Venue:
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Year:
2011

Citing 16
Cited 5

A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
A study of smoothing methods for language models applied to information retrieval

ACM Transactions on Information Systems (TOIS)
Content-based multimedia information retrieval: State of the art and challenges

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Design of Multimodal Dissimilarity Spaces for Retrieval of Video Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Late fusion of heterogeneous methods for multimedia image retrieval

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Crossing textual and visual content in different application scenarios

Multimedia Tools and Applications
Combining visual features and text data for medical image retrieval using latent semantic kernels

Proceedings of the international conference on Multimedia information retrieval
Information-based models for ad hoc IR

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
ImageCLEF: Experimental Evaluation in Visual Information Retrieval

ImageCLEF: Experimental Evaluation in Visual Information Retrieval
A new approach to cross-modal multimedia retrieval

Proceedings of the international conference on Multimedia
Improving the fisher kernel for large-scale image classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part IV
Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Predicting modality from text queries for medical image retrieval

MMAR '11 Proceedings of the 2011 international ACM workshop on Medical multimedia analysis and retrieval
On the consistency and features of image similarity

Proceedings of the 4th Information Interaction in Context Symposium
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

International Journal of Multimedia Data Engineering & Management
A multimodal content-based approach for web pages analysis

International Journal of Knowledge Engineering and Data Mining
Using contextual spaces for image re-ranking and rank aggregation

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of this paper is to introduce a set of techniques we call semantic combination in order to efficiently fuse text and image retrieval systems in the context of multimedia information access. These techniques emerge from the observation that image and textual queries are expressed at different semantic levels and that a single image query is often ambiguous. Overall, the semantic combination techniques overcome a conceptual barrier rather than a technical one: these methods can be seen as a combination of late fusion and image reranking. Albeit simple, this approach has not been used yet. We assess the proposed techniques against late and cross-media fusion using 4 different ImageCLEF datasets. Compared to late fusion, performances significantly increase on two datasets and remain similar on the two other ones.