Learning cross-modality similarity for multinomial data

Authors:
Yangqing Jia;Mathieu Salzmann;Trevor Darrell
Affiliations:
UC Berkeley EECS, USA;TTI-Chicago, USA;UC Berkeley EECS, USA
Venue:
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Year:
2011

Citing 0
Cited 2

A low rank structural large margin method for cross-modal ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Cross-media semantic representation via bi-directional learning to rank

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications involve multiple-modalities such as text and images that describe the problem of interest. In order to leverage the information present in all the modalities, one must model the relationships between them. While some techniques have been proposed to tackle this problem, they either are restricted to words describing visual objects only, or require full correspondences between the different modalities. As a consequence, they are unable to tackle more realistic scenarios where a narrative text is only loosely related to an image, and where only a few image-text pairs are available. In this paper, we propose a model that addresses both these challenges. Our model can be seen as a Markov random field of topic models, which connects the documents based on their similarity. As a consequence, the topics learned with our model are shared across connected documents, thus encoding the relations between different modalities. We demonstrate the effectiveness of our model for image retrieval from a loosely related text.