Fusion vs. two-stage for multimodal retrieval

Authors:
Avi Arampatzis;Konstantinos Zagoris;Savvas A. Chatzichristofis
Affiliations:
Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece;Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece;Department of Electrical and Computer Engineering, Democritus University of Thrace, Xanthi, Greece
Venue:
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Year:
2011

Citing 6
Cited 1

Evaluating and optimizing autonomous text classification systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Visual diversification of image search results

Proceedings of the 18th international conference on World wide web
Where to stop reading a ranked list?: threshold optimization using truncated score distributions

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Late fusion of compact composite descriptors for retrieval from heterogeneous image databases

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Dynamic two-stage image retrieval from large multimodal databases

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Inter-media pseudo-relevance feedback application to ImageCLEF 2006 photo retrieval

CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval

Bag-of-visual-words vs global image descriptors on two-stage multimodal retrieval

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare two methods for retrieval from multimodal collections. The first is a score-based fusion of results, retrieved visually and textually. The second is a two-stage method that visually re-ranks the top-K results textually retrieved. We discuss their underlying hypotheses and practical limitations, and contact a comparative evaluation on a standardized snapshot of Wikipedia. Both methods are found to be significantly more effective than single-modality baselines, with no clear winner but with different robustness features. Nevertheless, two-stage retrieval provides efficiency benefits over fusion.