FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Authors:
Changkyu Kim;Jatin Chhugani;Nadathur Satish;Eric Sedlar;Anthony D. Nguyen;Tim Kaldewey;Victor W. Lee;Scott A. Brandt;Pradeep Dubey
Affiliations:
Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Oracle Corporation, Redwood Shores, WA, USA;Intel Corporation, Santa Clara, CA, USA;Oracle Corporation, Redwood Shores, WA, USA;Intel Corporation, Santa Clara, CA, USA;University of California at Santa Cruz, Santa Cruz, CA, USA;Intel Corporation, Santa Clara, CA, USA
Venue:
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Year:
2010

Citing 28
Cited 26

Order-preserving minimal perfect hash functions and information retrieval

ACM Transactions on Information Systems (TOIS) - Special issue on research and development in information retrieval
Software pipelining

ACM Computing Surveys (CSUR)
Prefix B-trees

ACM Transactions on Database Systems (TODS)
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Main-memory index structures with fixed-size partial keys

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Implementing database operations using SIMD instructions

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fractal prefetching B+-Trees: optimizing both cache and disk performance

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Compressing Relations and Indexes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
B-Tree Indexes and CPU Caches

Proceedings of the 17th International Conference on Data Engineering
A Study of Index Structures for Main Memory Database Management Systems

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Data Compression Support in Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Effect of node size on the performance of cache-conscious B+-trees

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
How to barter bits for chronons: compression and bandwidth trade offs for database scans

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Buffering accesses to memory-resident index structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Adaptive aggregation on chip multiprocessors

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Efficient implementation of sorting on multi-core SIMD CPU architecture

Proceedings of the VLDB Endowment
Dictionary-based order-preserving string compression for main memory column stores

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
k-ary search on modern processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Real-time parallel hashing on the GPU

ACM SIGGRAPH Asia 2009 papers
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
Parallel search on video cards

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Database compression on graphics processors

Proceedings of the VLDB Endowment
High-throughput transaction executions on graphics processors

Proceedings of the VLDB Endowment
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
Fast updates on read-optimized databases using multi-core CPUs

Proceedings of the VLDB Endowment
Efficient methods for finding influential locations with adaptive grids

Proceedings of the 20th ACM international conference on Information and knowledge management
GPU-based minwise hashing: GPU-based minwise hashing

Proceedings of the 21st international conference companion on World Wide Web
CloudRAMSort: fast and efficient large-scale distributed RAM sort on shared-nothing cluster

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
KISS-Tree: smart latch-free in-memory indexing on modern architectures

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Ameliorating memory contention of OLAP operators on GPU processors

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
GiST scan acceleration using coprocessors

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
VAST-Tree: a vector-advanced and compressed structure for massive data tree traversal

Proceedings of the 15th International Conference on Extending Database Technology
Can traditional programming bridge the Ninja performance gap for parallel computing applications?

Proceedings of the 39th Annual International Symposium on Computer Architecture
Gdev: first-class GPU resource management in the operating system

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems

Proceedings of the VLDB Endowment
Partitioning and multi-core parallelization of multi-equation forecast models

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic synthesis of out-of-core algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Energy-efficient in-memory database computing

Proceedings of the Conference on Design, Automation and Test in Europe
Parallel multi-dimensional range query processing with R-trees on GPU

Journal of Parallel and Distributed Computing
Automatic vectorization of tree traversals

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
b-bit minwise hashing in practice

Proceedings of the 5th Asia-Pacific Symposium on Internetware
Efficient co-processor utilization in database query processing

Information Systems
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures

Proceedings of the VLDB Endowment
Why it is time for a HyPE: a hybrid query processing engine for efficient GPU coprocessing in DBMS

Proceedings of the VLDB Endowment
A study on parallelizing XML path filtering using accelerators

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous computing power by integrating multiple cores, each with wide vector units. There has been much work to exploit modern processor architectures for database primitives like scan, sort, join and aggregation. However, unlike other primitives, tree search presents significant challenges due to irregular and unpredictable data accesses in tree traversal. In this paper, we present FAST, an extremely fast architecture sensitive layout of the index tree. FAST is a binary tree logically organized to optimize for architecture features like page size, cache line size, and SIMD width of the underlying hardware. FAST eliminates impact of memory latency, and exploits thread-level and datalevel parallelism on both CPUs and GPUs to achieve 50 million (CPU) and 85 million (GPU) queries per second, 5X (CPU) and 1.7X (GPU) faster than the best previously reported performance on the same architectures. FAST supports efficient bulk updates by rebuilding index trees in less than 0.1 seconds for datasets as large as 64Mkeys and naturally integrates compression techniques, overcoming the memory bandwidth bottleneck and achieving a 6X performance improvement over uncompressed index search for large keys on CPUs.