Exploiting multi-modal interactions: a unified framework

Authors:
Ming Li;Xiao-Bing Xue;Zhi-Hua Zhou
Affiliations:
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Venue:
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Year:
2009

Citing 12
Cited 1

Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Matching words and pictures

The Journal of Machine Learning Research
GCap: Graph-based Automatic Image Captioning

CVPRW '04 Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 9 - Volume 09
Correlated Label Propagation with Application to Multi-label Learning

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Toward bridging the annotation-retrieval gap in image search by a generative modeling approach

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Enhanced max margin learning on multimodal data mining in a multimedia database

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining

Multi-modal distance metric learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given an imagebase with tagged images, four types of tasks can be executed, i.e., content-based image retrieval, image annotation, text-based image retrieval, and query expansion. For any of these tasks the similarity on the concerned type of objects is essential. In this paper, we propose a framework to tackle these four tasks from a unified view. The essence of the framework is to estimate similarities by exploiting the interactions between objects of different modality. Experiments show that the proposed method can improve similarity estimation, and based on the improved similarity estimation, some simple methods can achieve better performances than some state-of-the-art techniques.