A bridging model for parallel computation
Communications of the ACM
A relational model of data for large shared data banks
Communications of the ACM - Special 25th Anniversary Issue
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Fast computation of database operations using graphics processors
SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
Relational joins on graphics processors
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Data parallel acceleration of decision support queries using Cell/BE and GPUs
Proceedings of the 6th ACM conference on Computing frontiers
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
The Star Schema Benchmark and Augmented Fact Table Indexing
Performance Evaluation and Benchmarking
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
Performance Evaluation of TPC-H Queries on MySQL Cluster
WAINA '10 Proceedings of the 2010 IEEE 24th International Conference on Advanced Information Networking and Applications Workshops
Database compression on graphics processors
Proceedings of the VLDB Endowment
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Datalog and emerging applications: an interactive tutorial
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
GPU merge path: a GPU merging algorithm
Proceedings of the 26th ACM international conference on Supercomputing
Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Relational algorithms for multi-bulk-synchronous processors
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Satisfying Data-Intensive Queries Using GPU Clusters
SCC '12 Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Optimizing select conditions on GPUs
Proceedings of the Ninth International Workshop on Data Management on New Hardware
The Yin and Yang of processing data warehousing queries on GPU devices
Proceedings of the VLDB Endowment
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture
Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures
Proceedings of the VLDB Endowment
ParallelJS: An Execution Framework for JavaScript on Heterogeneous Systems
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Modern enterprise applications represent an emergent application arena that requires the processing of queries and computations over massive amounts of data. Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently overall performance. However, throughput improvement using GPUs is challenged by the distinctive memory and computational characteristics of Relational Algebra (RA) operators that are central to queries for answering business questions. This paper introduces the design, implementation, and evaluation of Red Fox, a compiler and runtime infrastructure for executing relational queries on GPUs. Red Fox is comprised of i) a language front-end for LogiQL which is a commercial query language, ii) an RA to GPU compiler, iii) optimized GPU implementation of RA operators, and iv) a supporting runtime. We report the performance on the full set of industry standard TPC-H queries on a single node GPU. Compared with a commercial LogiQL system implementation optimized for a state of art CPU machine, Red Fox on average is 6.48x faster including PCIe transfer time. We point out key bottlenecks, propose potential solutions, and analyze the GPU implementation of these queries. To the best of our knowledge, this is the first reported end-to-end compilation and execution infrastructure that supports the full set of TPC-H queries on commodity GPUs.