Joint visual-text modeling for automatic retrieval of multimedia documents

Authors:
G. Iyengar;P. Duygulu;S. Feng;P. Ircing;S. P. Khudanpur;D. Klakow;M. R. Krause;R. Manmatha;H. J. Nock;D. Petkova;B. Pytlik;P. Virga
Affiliations:
IBM TJ Watson Research Center;Bilkent University;University of Massachusetts, Amherst;Univ. West Bohemia;Johns Hopkins University;Saarland University;Georgetown University;University of Massachusetts, Amherst;IBM TJ Watson Research Center;Mt. Holyoke College;Johns Hopkins University;Johns Hopkins University
Venue:
Proceedings of the 13th annual ACM international conference on Multimedia
Year:
2005

Citing 11
Cited 14

WordNet: a lexical database for English

Communications of the ACM
A language modeling approach to information retrieval

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Document language models, query models, and risk minimization for information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Automatic image annotation and retrieval using cross-media relevance models

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
VideoQA: question answering on news video

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A maximum entropy approach to identifying sentence boundaries

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Hidden Markov models for automatic annotation and content-based retrieval of images and video

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multiple Bernoulli relevance models for image and video annotation

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Adaptive image retrieval using a Graph model for semantic feature integration

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Learning ontology for personalized video retrieval

Workshop on multimedia information retrieval on The many faces of multimedia semantics
A review of text and image retrieval approaches for broadcast news video

Information Retrieval
Video search re-ranking via multi-graph propagation

Proceedings of the 15th international conference on Multimedia
A discrete direct retrieval model for image and video retrieval

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Improving Automatic Image Annotation Based on Word Co-occurrence

Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Multimedia ontology learning for automatic annotation and video browsing

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Crossing textual and visual content in different application scenarios

Multimedia Tools and Applications
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
CLOVIS: towards precision-oriented text-based video retrieval through the unification of automatically-extracted concepts and relations of the visual and audio/speech contents

Journal of Intelligent Information Systems
MQSS: multimodal query suggestion and searching for video search

Multimedia Tools and Applications
Concept propagation based on visual similarity

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Using high-level semantic features in video retrieval

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Joint-rerank: a novel method for image search reranking

Proceedings of the 2nd ACM International Conference on Multimedia Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.