Automatic weight selection for multi-metric distances

  • Authors:
  • Juan Manuel Barrios;Benjamin Bustos

  • Affiliations:
  • University of Chile;University of Chile

  • Venue:
  • Proceedings of the Fourth International Conference on SImilarity Search and APplications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Content-Based Multimedia Information Retrieval retrieves multimedia documents based on their content (colors, edges, textures, etc.). The content of a whole multimedia document is represented by a global descriptor. The similarity of two multimedia documents can be defined as the distance between their descriptors. A multi-metric function that combines distances from many descriptors usually outperforms the effectiveness of any single descriptor. In this case, a different weight is assigned to each descriptor representing its relative importance in the combination. Usually, these sets of weights are fixed manually or by performing many effectiveness evaluations. In this work, we present three novel techniques for weighting multi-metrics: á-normalization, which is a generalization of the normalization by maximum distance that uses the histogram of distances, MID-weighting which selects weights that maximize intrinsic dimensionality, and MID-á-weighting that combines the two previous techniques. These techniques enable the selection of a set of weights with satisfactory effectiveness without performing any effectiveness evaluation. Thus, they are suitable when a ground truth does not exist or when it is expensive to perform an evaluation. We tested their effectiveness on a content-based copy detection corpus, and we analyzed the behavior of effectiveness and efficiency in a multi-metric space. We conclude that MID-á-weighting outperforms the widely used maximum distance normalization, and that it can be used as an automatic weight selection for further manual adjustment.