VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal

  • Authors:
  • Takeshi Yamamuro;Makoto Onizuka;Toshio Hitaka;Masashi Yamamuro

  • Affiliations:
  • NTT Cyber Space Laboratories, NTT corporation;NTT Cyber Space Laboratories, NTT corporation;NTT Cyber Space Laboratories, NTT corporation;NTT Cyber Space Laboratories, NTT corporation

  • Venue:
  • Proceedings of the 15th International Conference on Extending Database Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a compact and efficient index structure for massive data sets. Several indexing techniques are widely-used and well-known such as binary trees and B+trees. Unfortunately, we find that these techniques suffer major two shortcomings when applied to massive sets; first, their indices are so large they could overflow regular main memory, and, second, they suffer from a variety of penalties (e.g., conditional branches, low cache hits, and TLB misses), which restricts the number of instructions executed per processor cycle. Our state-of-the-art index structure, called VAST-Tree, classifies branch nodes into multiple layers. It applies existing techniques such as cache-conscious, aligned, and branch-free structures to the top layers of branch nodes in trees. Next, it applies the adaptive compression technique to save space and harness data parallelism with SIMD instructions to the middle and bottom layers of branch nodes. Moreover, a processor-friendly compression technique is applied to leaf nodes. The end result is that trees are much more compact and traversal efficiency is high. We implement a prototype and show its resulting index size and performance as compared to binary trees, and the hardware-conscious technique called FAST which currently offers the highest performance. Compared to current alternatives, VAST-Tree compacts the branch nodes by more than 95%, and the overall index size by 47-84% given that there are 230 keys. With 228 keys, it has roughly 6.0-times and 1.24-times throughput and saves the memory consumption by more than 94.7% and 40.5% as compared to binary trees and FAST, respectively.