Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

  • Authors:
  • Ada Wai-chee Fu;Polly Mei-shuen Chan;Yin-Ling Cheung;Yiu Sang Moon

  • Affiliations:
  • Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong/ e-mail: adafu@cse.cuhk.edu.hk;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong/ e-mail: adafu@cse.cuhk.edu.hk;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong/ e-mail: adafu@cse.cuhk.edu.hk;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong/ e-mail: adafu@cse.cuhk.edu.hk

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

For some multimedia applications, it has been found that domain objects cannot be represented as feature vectors in a multidimensional space. Instead, pair-wise distances between data objects are the only input. To support content-based retrieval, one approach maps each object to a k-dimensional (k-d) point and tries to preserve the distances among the points. Then, existing spatial access index methods such as the R-trees and KD-trees can support fast searching on the resulting k-d points. However, information loss is inevitable with such an approach since the distances between data objects can only be preserved to a certain extent. Here we investigate the use of a distance-based indexing method. In particular, we apply the vantage point tree (vp-tree) method. There are two important problems for the vp-tree method that warrant further investigation, the n-nearest neighbors search and the updating mechanisms. We study an n-nearest neighbors search algorithm for the vp-tree, which is shown by experiments to scale up well with the size of the dataset and the desired number of nearest neighbors, n. Experiments also show that the searching in the vp-tree is more efficient than that for the $R^*$-tree and the M-tree. Next, we propose solutions for the update problem for the vp-tree, and show by experiments that the algorithms are efficient and effective. Finally, we investigate the problem of selecting vantage-point, propose a few alternative methods, and study their impact on the number of distance computation.