FAST: fast architecture sensitive tree search on modern CPUs and GPUs

  • Authors:
  • Changkyu Kim;Jatin Chhugani;Nadathur Satish;Eric Sedlar;Anthony D. Nguyen;Tim Kaldewey;Victor W. Lee;Scott A. Brandt;Pradeep Dubey

  • Affiliations:
  • Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Oracle Corporation, Redwood Shores, WA, USA;Intel Corporation, Santa Clara, CA, USA;Oracle Corporation, Redwood Shores, WA, USA;Intel Corporation, Santa Clara, CA, USA;University of California at Santa Cruz, Santa Cruz, CA, USA;Intel Corporation, Santa Clara, CA, USA

  • Venue:
  • Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal. In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware. FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64Mkeys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.