GPU join processing revisited

Authors:
Tim Kaldewey;Guy Lohman;Rene Mueller;Peter Volk
Affiliations:
IBM Almaden Research, San Jose, CA;IBM Almaden Research, San Jose, CA;IBM Almaden Research, San Jose, CA;Technische Universität Dresden
Venue:
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Year:
2012

Citing 10
Cited 10

Optimizing Main-Memory Join on Modern Hardware

IEEE Transactions on Knowledge and Data Engineering
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
PCI Express System Architecture

PCI Express System Architecture
Efficient relational database management using graphics processors

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Relational joins on graphics processors

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Parallel Computing Experiences with CUDA

IEEE Micro
Relational query coprocessing on graphics processors

ACM Transactions on Database Systems (TODS)
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Design and evaluation of main memory hash join algorithms for multi-core CPUs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

WOW: what the world of (data) warehousing can learn from the World of Warcraft

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Data management systems on GPUs: promises and challenges

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
LINQits: big data on little clients

Proceedings of the 40th Annual International Symposium on Computer Architecture
Efficient co-processor utilization in database query processing

Information Systems
The Yin and Yang of processing data warehousing queries on GPU devices

Proceedings of the VLDB Endowment
Revisiting co-processing for hash joins on the coupled CPU-GPU architecture

Proceedings of the VLDB Endowment
Next generation data analytics at IBM research

Proceedings of the VLDB Endowment
OmniDB: towards portable and efficient query processing on parallel CPU/GPU architectures

Proceedings of the VLDB Endowment
Red Fox: An Execution Environment for Relational Query Processing on GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hardware acceleration of database operations

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Quantified Score

Hi-index	0.00

Visualization

Abstract

Until recently, the use of graphics processing units (GPUs) for query processing was limited by the amount of memory on the graphics card, a few gigabytes at best. Moreover, input tables had to be copied to GPU memory before they could be processed, and after computation was completed, query results had to be copied back to CPU memory. The newest generation of Nvidia GPUs and development tools introduces a common memory address space, which now allows the GPU to access CPU memory directly, lifting size limitations and obviating data copy operations. We confirm that this new technology can sustain 98% of its nominal rate of 6.3 GB/sec in practice, and exploit it to process database hash joins at the same rate, i.e., the join is processed "on the fly" as the GPU reads the input tables from CPU memory at PCI-E speeds. Compared to the fastest published results for in-memory joins on the CPU, this represents more than half an order of magnitude speed-up. All of our results include the cost of result materialization (often omitted in earlier work), and we investigate the implications of changing join predicate selectivity and table size.