Spatial indexing of high-dimensional data based on relative approximation

  • Authors:
  • Yasushi Sakurai;Masatoshi Yoshikawa;Shunsuke Uemura;Haruhiko Kojima

  • Affiliations:
  • NTT Cyber Space Laboratories, 1-1 Hikari-no-oka, Yokosuka, Kanagawa 239-0847, Japan/ e-mail: sakurai.yasushi&commat/lab.ntt.co.jp, kojima&commat/aether.hil.ntt.co.jp;Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan/ e-mail: &rcub/yosikawa, uemura&rcub/&commat/is.aist-nara.ac.jp;Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara 630-0101, Japan/ e-mail: &rcub/yosikawa, uemura&rcub/&commat/is.aist-nara.ac.jp;NTT Cyber Space Laboratories, 1-1 Hikari-no-oka, Yokosuka, Kanagawa 239-0847, Japan/ e-mail: sakurai.yasushi&commat/lab.ntt.co.jp, kojima&commat/aether.hil.ntt.co.jp

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel index structure, the A-tree (approximation tree), for similarity searches in high-dimensional data. The basic idea of the A-tree is the introduction of virtual bounding rectangles (VBRs) which contain and approximate MBRs or data objects. VBRs can be represented quite compactly and thus affect the tree configuration both quantitatively and qualitatively. First, since tree nodes can contain a large number of VBR entries, fanout becomes large, which increases search speed. More importantly, we have a free hand in arranging MBRs and VBRs in the tree nodes. Each A-tree node contains an MBR and its children VBRs. Therefore, by fetching an A-tree node, we can obtain information on the exact position of a parent MBR and the approximate position of its children. We have performed experiments using both synthetic and real data sets. For the real data sets, the A-tree outperforms the SR-tree and the VA-file in all dimensionalities up to 64 dimensions, which is the highest dimension in our experiments. Additionally, we propose a cost model for the A-tree. We verify the validity of the cost model for synthetic and real data sets.