Adapting indexing trees to data distribution in feature spaces

  • Authors:
  • Xiaoning Qian;Hemant D. Tagare

  • Affiliations:
  • Department of Diagnostic Radiology, Yale University, New Haven, CT 06520, United States;Department of Diagnostic Radiology, Yale University, New Haven, CT 06520, United States and Department of Electrical Engineering, Yale University, New Haven, CT 06520, United States

  • Venue:
  • Computer Vision and Image Understanding
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fast similarity retrieval is critical for content-based image retrieval systems. Tree indexing is a classical technique for fast retrieval, but the practical performance increase offered by the indexing tree depends on the intrinsic dimension of the data. Data with a low intrinsic dimension can be indexed more efficiently than data with high intrinsic dimension. This suggests that an indexing tree that is adapted to the data distribution may be more efficient. This paper proposes two adaptation procedures that are guaranteed to improve indexing efficiency. The procedures are based on a formula for average number of node tests incurred during the retrieval. The formula clearly shows how indexing performance varies with the distribution of feature points and the query. Greedy and optimal tree adaptation procedures are derived based on the formula. Both procedures explicitly enhance the retrieval performance of indexing trees. The optimally adapted tree carries the mathematical guarantee that it is the best performing tree in a set of possible trees obtained by node elimination. The adaptation procedures are applied to kdb-trees and hierarchical clustering trees for indexing synthetic as well as real data sets in medical image databases. Experimental results validate the claim that adaptation procedures increase retrieval efficiency.