Optimizing select conditions on GPUs

Authors:
Evangelia A. Sitaridi;Kenneth A. Ross
Affiliations:
Columbia University;Columbia University
Venue:
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Year:
2013

Citing 15
Cited 1

Optimization techniques for queries with expensive methods

ACM Transactions on Database Systems (TODS)
Selection conditions in main memory

ACM Transactions on Database Systems (TODS)
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A control-structure splitting optimization for GPGPU

Proceedings of the 6th ACM conference on Computing frontiers
Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Streamlining GPU applications on the fly: thread divergence elimination through runtime thread-data remapping

Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Software-based branch predication for AMD GPUs

ACM SIGARCH Computer Architecture News
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Reducing branch divergence in GPU programs

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Efficiently compiling efficient query plans for modern hardware

Proceedings of the VLDB Endowment
SIMD re-convergence at thread frontiers

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Ameliorating memory contention of OLAP operators on GPU processors

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Optimizing Data Warehousing Applications for GPUs Using Kernel Fusion/Fission

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Red Fox: An Execution Environment for Relational Query Processing on GPUs

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Implementations of data processing operators on GPU processors have achieved significant performance improvements over their multicore CPU counterparts. To achieve maximum performance, database operator implementations must take into consideration special features of GPU architectures. A crucial difference is that the unit of execution is a group ("warp") of threads, 32 threads in our target architecture, as opposed to a single thread for CPUs. In the presence of branches, threads in a warp have to follow the same execution path; if some threads diverge then different paths are serialized. Additionally, similarly to CPUs, branches degrade the efficiency of instruction scheduling. Here, we study conjunctive selection queries where branching hurts performance. We compute the optimal execution plan for a conjunctive query, taking branch penalties into account and consider both single-kernel and multi-kernel plans. Our evaluation suggests that divergence affects performance significantly and that our techniques reduce resource underutilization and improve operator performance.