WordNet: a lexical database for English
Communications of the ACM
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Document language models, query models, and risk minimization for information retrieval
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part IV
Automatic image annotation and retrieval using cross-media relevance models
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
VideoQA: question answering on news video
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
A maximum entropy approach to identifying sentence boundaries
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Hidden Markov models for automatic annotation and content-based retrieval of images and video
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Multiple Bernoulli relevance models for image and video annotation
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Adaptive image retrieval using a Graph model for semantic feature integration
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Learning ontology for personalized video retrieval
Workshop on multimedia information retrieval on The many faces of multimedia semantics
A review of text and image retrieval approaches for broadcast news video
Information Retrieval
Video search re-ranking via multi-graph propagation
Proceedings of the 15th international conference on Multimedia
A discrete direct retrieval model for image and video retrieval
CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Improving Automatic Image Annotation Based on Word Co-occurrence
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Multimedia ontology learning for automatic annotation and video browsing
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Crossing textual and visual content in different application scenarios
Multimedia Tools and Applications
Foundations and Trends in Information Retrieval
MQSS: multimodal query suggestion and searching for video search
Multimedia Tools and Applications
Concept propagation based on visual similarity
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Using high-level semantic features in video retrieval
CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
Joint-rerank: a novel method for image search reranking
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Hi-index | 0.00 |
In this paper we describe a novel approach for jointly modeling the text and the visual components of multimedia documents for the purpose of information retrieval(IR). We propose a novel framework where individual components are developed to model different relationships between documents and queries and then combined into a joint retrieval framework. In the state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in significant improvement in performance over any single modality. We demonstrate these results using the TRECVID03 corpus, which comprises 120 hours of broadcast news videos. Our results demonstrate over 14 % improvement in IR performance over the best reported text-only baseline and ranks amongst the best results reported on this corpus.