Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

  • Authors:
  • Hanxiong Chen;Jianquan Liu;Kazutaka Furuse;Jeffrey Xu Yu;Nobuo Ohbo

  • Affiliations:
  • Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Systems Engineering & Engineering Management, Chinese University of HongKong, China;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577

  • Venue:
  • ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Indexing structures based on space partitioning are powerless because of the well-known "curse of dimensionality". Linear scan of the data with approximation is more efficient in high dimensional similarity search. However, approaches so far concentrated on reducing I/O, ignored the computation cost. For an expensive distance function such as L p norm with fractional p , the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by "indexing the function" by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between each data and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not spend any extra storage because no index is constructed and stored. The efficiency is confirmed by cost analyses, as well as experiments on synthetic and real data.