Approximate Retrieval of High-Dimensional Data by Spatial Indexing

  • Authors:
  • Takeshi Shinohara;Jiyuan An;Hiroki Ishizaka

  • Affiliations:
  • -;-;-

  • Venue:
  • DS '98 Proceedings of the First International Conference on Discovery Science
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-dimensional data, such as documents, digital images, and audio clips, can be considered as spatial objects, which induce a metric space where the metric can be used to measure dissimilarities between objects. We propose a method for retrieving objects within some distance from a given object by utilizing a spatial indexing/access method R-tree. Since R-tree usually assumes a Euclidean metric, we have to embed objects into a Euclidean space. However, some of naturally defined distance measures, such as L1 distance (or Manhattan distance), cannot be embedded into any Euclidean space. First, we prove that objects in discrete L1 metric space can be embedded into vertices of a unit hypercube when the square root of L1 distance is used as the distance. To take fully advantage of R-tree spatial indexing, we have to project objects into space of relatively lower dimension. We adopt FastMap by Faloutsos and Lin to reduce the dimension of object space. The range corresponding to a query (Q, h) for retrieving objects within distance h from a object Q is naturally considered as a hyper-sphere even after FastMap projection, which is an orthogonal projection in Euclidean space. However, it is turned out that the query range is contracted into a smaller hyper-box than the hyper-sphere by applying FastMap to objects embedded in the above mentioned way. Finally, we give a brief summary of experiments in applying our method to Japanese chess boards.