Red Fox: An Execution Environment for Relational Query Processing on GPUs

Authors:
Haicheng Wu;Gregory Diamos;Tim Sheard;Molham Aref;Sean Baxter;Michael Garland;Sudhakar Yalamanchili
Affiliations:
Georgia Institute of Technology;NVIDIA;Portland State University;LogicBlox Inc.;NVIDIA;NVIDIA;Georgia Institute of Technology
Venue:
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2014

Citing 27
Cited 1

A bridging model for parallel computation

Communications of the ACM
A relational model of data for large shared data banks

Communications of the ACM - Special 25th Anniversary Issue
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Fast computation of database operations using graphics processors

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Relational joins on graphics processors

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Harmony: an execution model and runtime for heterogeneous many core systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Data parallel acceleration of decision support queries using Cell/BE and GPUs

Proceedings of the 6th ACM conference on Computing frontiers
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Relational query coprocessing on graphics processors

ACM Transactions on Database Systems (TODS)
The Star Schema Benchmark and Augmented Fact Table Indexing

Performance Evaluation and Benchmarking
Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
OptiX: a general purpose ray tracing engine

ACM SIGGRAPH 2010 papers
Performance Evaluation of TPC-H Queries on MySQL Cluster

WAINA '10 Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops
Database compression on graphics processors

Proceedings of the VLDB Endowment
Copperhead: compiling an embedded data parallel language

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Datalog and emerging applications: an interactive tutorial

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
GPU join processing revisited

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
GPU merge path: a GPU merging algorithm

Proceedings of the 26th ACM international conference on Supercomputing
Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Relational algorithms for multi-bulk-synchronous processors

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Satisfying Data-Intensive Queries Using GPU Clusters

SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Optimizing select conditions on GPUs

Proceedings of the Ninth International Workshop on Data Management on New Hardware
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures

Proceedings of the VLDB Endowment

ParallelJS: An Execution Framework for JavaScript on Heterogeneous Systems

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern enterprise applications represent an emergent application arena that requires the processing of queries and computations over massive amounts of data. Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently overall performance. However, throughput improvement using GPUs is challenged by the distinctive memory and computational characteristics of Relational Algebra (RA) operators that are central to queries for answering business questions. This paper introduces the design, implementation, and evaluation of Red Fox, a compiler and runtime infrastructure for executing relational queries on GPUs. Red Fox is comprised of i) a language front-end for LogiQL which is a commercial query language, ii) an RA to GPU compiler, iii) optimized GPU implementation of RA operators, and iv) a supporting runtime. We report the performance on the full set of industry standard TPC-H queries on a single node GPU. Compared with a commercial LogiQL system implementation optimized for a state of art CPU machine, Red Fox on average is 6.48x faster including PCIe transfer time. We point out key bottlenecks, propose potential solutions, and analyze the GPU implementation of these queries. To the best of our knowledge, this is the first reported end-to-end compilation and execution infrastructure that supports the full set of TPC-H queries on commodity GPUs.