Properties of optimally weighted data fusion in CBMIR

Authors:
Peter Wilkins;Alan F. Smeaton;Paul Ferguson
Affiliations:
Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland;Dublin City University, Dublin, Ireland
Venue:
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Year:
2010

Citing 16
Cited 5

Evaluation of an inference network-based retrieval model

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
The effect multiple query representations on information retrieval system performance

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Combining the evidence of multiple query representations for information retrieval

TREC-2 Proceedings of the second conference on Text retrieval conference
Analyses of multiple evidence combination

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A study of the overlap among document representations

SIGIR '83 Proceedings of the 6th annual international ACM SIGIR conference on Research and development in information retrieval
Fusion Via a Linear Combination of Scores

Information Retrieval
Introduction to MPEG-7: Multimedia Content Description Interface

Introduction to MPEG-7: Multimedia Content Description Interface
System Fusion for Improving Performance in Information Retrieval Systems

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
The combination limit in multimedia retrieval

MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Fusion of effective retrieval strategies in the same information retrieval system

Journal of the American Society for Information Science and Technology
Learning the semantics of multimedia queries and concepts from a small number of examples

Proceedings of the 13th annual ACM international conference on Multimedia
Automatic discovery of query-class-dependent models for multimodal search

Proceedings of the 13th annual ACM international conference on Multimedia
Probabilistic latent query analysis for combining multiple retrieval sources

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Linear feature-based models for information retrieval

Information Retrieval
A comparison of score, rank and probability-based fusion methods for video shot retrieval

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval

The effects of heterogeneous information combination on large scale social image search

Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Predicting query performance directly from score distributions

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
An information retrieval approach to identifying infrequent events in surveillance video

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Multimedia search reranking: A literature survey

ACM Computing Surveys (CSUR)
Document Score Distribution Models for Query Performance Inference and Prediction

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Content-Based Multimedia Information Retrieval (CBMIR) systems which leverage multiple retrieval experts (En) often employ a weighting scheme when combining expert results through data fusion. Typically however a query will comprise multiple query images (Im) leading to potentially N × M weights to be assigned. Because of the large number of potential weights, existing approaches impose a hierarchy for data fusion, such as uniformly combining query image results from a single retrieval expert into a single list and then weighting the results of each expert. In this paper we will demonstrate that this approach is sub-optimal and leads to the poor state of CBMIR performance in benchmarking evaluations. We utilize an optimization method known as Coordinate Ascent to discover the optimal set of weights (|En| ⋅ |Im|) which demonstrates a dramatic difference between known results and the theoretical maximum. We find that imposing common combinatorial hierarchies for data fusion will half the optimal performance that can be achieved. By examining the optimal weight sets at the topic level, we observe that approximately 15% of the weights (from set |En| ⋅ |Im|) for any given query, are assigned 70%-82% of the total weight mass for that topic. Furthermore we discover that the ideal distribution of weights follows a log-normal distribution. We find that we can achieve up to 88% of the performance of fully optimized query using just these 15% of the weights. Our investigation was conducted on TRECVID evaluations 2003 to 2007 inclusive and ImageCLEFPhoto 2007, totalling 181 search topics optimized over a combined collection size of 661,213 images and 1,594 topic images.