A new approach to cross-modal multimedia retrieval

Authors:
Nikhil Rasiwasia;Jose Costa Pereira;Emanuele Coviello;Gabriel Doyle;Gert R.G. Lanckriet;Roger Levy;Nuno Vasconcelos
Affiliations:
University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA;University of California, San Diego, San Diego, USA
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 23
Cited 26

Content-Based Image Retrieval at the End of the Early Years

IEEE Transactions on Pattern Analysis and Machine Intelligence
Probabilistic multimedia retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Modeling annotated data

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Combining Words and Object-Based Visual Features in Image Retrieval

ICIAP '03 Proceedings of the 12th International Conference on Image Analysis and Processing
Latent dirichlet allocation

The Journal of Machine Learning Research
Matching words and pictures

The Journal of Machine Learning Research
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Multimodal Video Indexing: A Review of the State-of-the-art

Multimedia Tools and Applications
Matrix Analysis For Scientists And Engineers

Matrix Analysis For Scientists And Engineers
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Text Information Retrieval Systems, Third Edition (Library and Information Science) (Library and Information Science)

Text Information Retrieval Systems, Third Edition (Library and Information Science) (Library and Information Science)
Supervised Learning of Semantic Classes for Image Annotation and Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Modeling Semantic Aspects for Cross-Media Image Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Latent semantic fusion model for image retrieval and annotation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Image retrieval: Ideas, influences, and trends of the new age

ACM Computing Surveys (CSUR)
Late fusion of heterogeneous methods for multimedia image retrieval

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Combining image captions and visual analysis for image concept classification

Proceedings of the 9th International Workshop on Multimedia Data Mining: held in conjunction with the ACM SIGKDD 2008
Diversity in photo retrieval: overview of the ImageCLEFPhoto task 2009

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Overview of the wikipediaMM task at ImageCLEF 2009

CLEF'09 Proceedings of the 10th international conference on Cross-language evaluation forum: multimedia experiments
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Minimum probability of error image retrieval

IEEE Transactions on Signal Processing
Bridging the Gap: Query by Semantic Example

IEEE Transactions on Multimedia

Semantic combination of textual and visual information in multimedia retrieval

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Towards a new reading experience via semantic fusion of text and music

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient multi-modal retrieval in conceptual space

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Text and image subject classifiers: dense works better

MM '11 Proceedings of the 19th ACM international conference on Multimedia
A survey of semantic multimedia retrieval systems

MACMESE'11 Proceedings of the 13th WSEAS international conference on Mathematical and computational methods in science and engineering
Effective heterogeneous similarity measure with nearest neighbors for cross-media retrieval

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
Learning to summarize web image and text mutually

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
A probabilistic model for multimodal hash function learning

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Leveraging high-level and low-level features for multimedia event detection

Proceedings of the 20th ACM international conference on Multimedia
Cross matching of music and image

Proceedings of the 20th ACM international conference on Multimedia
Distributional semantics with eyes: using image analysis to improve computational representations of word meaning

Proceedings of the 20th ACM international conference on Multimedia
A cross-media evolutionary timeline generation framework based on iterative recommendation

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
A semantic model for cross-modal and multi-modal retrieval

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
High order pLSA for indexing tagged images

Signal Processing
A low rank structural large margin method for cross-modal ranking

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Parallel field alignment for cross media retrieval

Proceedings of the 21st ACM international conference on Multimedia
Cross-media semantic representation via bi-directional learning to rank

Proceedings of the 21st ACM international conference on Multimedia
Linear cross-modal hashing for efficient multimedia search

Proceedings of the 21st ACM international conference on Multimedia
Cross-media topic mining on wikipedia

Proceedings of the 21st ACM international conference on Multimedia
Video2Sentence and vice versa

Proceedings of the 21st ACM international conference on Multimedia
Content-Based Multimedia Retrieval Using Feature Correlation Clustering and Fusion

International Journal of Multimedia Data Engineering & Management
Parametric local multimodal hashing for cross-view similarity search

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Nonparametric bayesian upstream supervised multi-modal topic models

Proceedings of the 7th ACM international conference on Web search and data mining
Cross domain recommendation based on multi-type media fusion

Neurocomputing
Framing image description as a ranking task: data, models and evaluation metrics

Journal of Artificial Intelligence Research
A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of joint modeling the text and image components of multimedia documents is studied. The text component is represented as a sample from a hidden topic model, learned with latent Dirichlet allocation, and images are represented as bags of visual (SIFT) features. Two hypotheses are investigated: that 1) there is a benefit to explicitly modeling correlations between the two components, and 2) this modeling is more effective in feature spaces with higher levels of abstraction. Correlations between the two components are learned with canonical correlation analysis. Abstraction is achieved by representing text and images at a more general, semantic level. The two hypotheses are studied in the context of the task of cross-modal document retrieval. This includes retrieving the text that most closely matches a query image, or retrieving the images that most closely match a query text. It is shown that accounting for cross-modal correlations and semantic abstraction both improve retrieval accuracy. The cross-modal model is also shown to outperform state-of-the-art image retrieval systems on a unimodal retrieval task.