Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
The MIR flickr retrieval evaluation
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Hi-index | 0.00 |
Image collections on the internet and other sources of information can naturally include attached text descriptions. This work considers the problem of fusing two data modalities: visual content and text keywords, to allow a flexible image indexing scheme. The proposed strategy learns multimodal relationships using matrix reconstruction principles and factorization algorithms, allowing one data modality to be represented in another modality space. We further exploit this exchangeability property, to fuse the modalities in any of the representation spaces by backprojecting predicted data to the input space. An experimental evaluation was carried out on the Corel 5K and MIRFlickr data sets using example images without text as query paradigm. Experimental results demonstrate the ability of the proposed strategy to find multimodal links between data and make them useful to improve the image retrieval performance.